CN115331039A

CN115331039A - Automatic image labeling method, device, equipment and medium

Info

Publication number: CN115331039A
Application number: CN202210977321.1A
Authority: CN
Inventors: 王丹丹; 黄宇恒; 金晓峰; 徐天适; 戴巾帼
Original assignee: GRG Banking Equipment Co Ltd
Current assignee: GRG Banking Equipment Co Ltd
Priority date: 2022-08-15
Filing date: 2022-08-15
Publication date: 2022-11-11
Also published as: WO2024036758A1

Abstract

The invention discloses an automatic image annotation method, device, equipment and medium, which comprises the steps of obtaining a plurality of sample images to be annotated, extracting image characteristics of a network representation layer based on a CNN convolutional neural network characteristic extraction model, and constructing a first KNN relational graph; inputting a first KNN relation graph of the sample image to be marked, of which the density node is lower than the empirical value, into a pre-trained GCN-V node affinity prediction model, and predicting the affinity among the nodes of the KNN relation graph; cutting edges among the nodes according to the affinity, enabling the nodes in the same cluster to belong to the same sample, and pruning to obtain a second KNN relation graph of the sample image to be marked; inputting the second KNN relation graph of the sample image to be marked into a pre-trained GCN-C edge connection strength prediction model, and predicting the edge connection strength between clusters; and extracting the cluster with the edge connection strength larger than a threshold value to form the cluster of the sample image to be marked.

Description

Automatic image labeling method, device, equipment and medium

Technical Field

The invention belongs to the technical field of image data processing, and particularly relates to an automatic image labeling method, device, equipment and medium.

Background

A clean large-scale image library is a crucial step in obtaining a high-accuracy model. However, the labeling and cleaning difficulty of the image library is very high, and the manual labeling cost is very high. Particularly, the samples which are difficult to be distinguished, such as pedestrian samples of the same type are classified into different types, and the samples are difficult to be found by people.

Therefore, the automatic image clustering annotation method, device, equipment and medium for reducing the manual annotation cost are very important.

Disclosure of Invention

The invention aims to provide an automatic image labeling method, device, equipment and medium, which define the affinity among nodes through the structure of a relational graph and the characteristic similarity of a plurality of nodes, predict the association strength of two clusters through the characteristic information of the same type of samples (clusters) and the relation between the characteristic information and other types of samples, and achieve the purpose of self labeling and clustering of images.

A first aspect of an embodiment of the present application provides an automatic image annotation method, including: acquiring a plurality of sample images to be labeled, extracting the image characteristics of each sample image network characterization layer to be labeled based on a CNN convolutional neural network characteristic extraction model, and constructing a first KNN relation graph of each sample image to be labeled based on a K nearest neighbor algorithm according to the image characteristics; inputting a first KNN relation graph of the sample image to be marked, of which the density node is lower than the empirical value, into a pre-trained GCN-V node affinity prediction model, and predicting the affinity among the KNN relation graph nodes; cutting edges among the nodes according to the affinity, enabling the nodes in the same cluster to belong to the same type of sample, and pruning to obtain a second KNN relation graph of the sample image to be labeled; inputting the second KNN relation graph of the sample image to be marked into a pre-trained GCN-C edge connection strength prediction model, and predicting the edge connection strength between clusters; and extracting the cluster with the edge connection strength larger than the threshold value to form the cluster of the sample image to be labeled.

Furthermore, the GCN-V node affinity prediction model is obtained by utilizing a CNN model to calculate the node affinity of the labeled image and train a graph convolution neural network.

Furthermore, the GCN-C edge connection strength prediction model is obtained by training a graph convolution neural network by utilizing a CNN model to calculate a feature matrix of a cluster relation graph of an annotated image

Further, the CNN convolutional neural network feature extraction model extracts the image features of each sample image network representation layer to be labeled, including the feature vector F of the sample image network representation layer to be labeled, wherein F belongs to R ^(N×D) Wherein N represents the number of unlabelled images, D represents the dimensionality of the feature vector, and the first KNN relational graph is K value data of each sample image to be labeled, which is obtained by using a K nearest neighbor algorithm (KNN) according to the feature vector and the vector inner product measurement.

Further, the first KNN relation graph G (V, E) is an undirected graph, and nodes Vi (i E [0, n)) represent image data, characterized by feature vectors Fi; the edge Ej (i ∈ [0, K)) represents the relationship between two connected nodes, characterized by the adjacency matrix.

Further, the method of the pre-trained GCN-V node affinity prediction model specifically comprises the following steps: inputting the marked graph into a CNN convolutional neural network feature extraction model to extract identification features; constructing a KNN affinity relation graph of the labeled image based on a K nearest neighbor algorithm according to the identification characteristics; and selecting a KNN affinity relation graph with density nodes lower than empirical values to train to obtain the GCN-V node affinity prediction model.

Further, the method of the pre-trained GCN-C edge connection strength prediction model specifically comprises the following steps: setting a first threshold value based on the KNN affinity relation graph to form a node cluster; setting a second threshold value to construct a new cluster relation graph; and calculating a characteristic matrix of the cluster relation graph, and training to obtain a GCN-C edge connection strength prediction model.

Further, inputting the first KNN relational graph with density nodes lower than empirical values into a pre-trained GCN-V node affinity prediction model to predict the affinity among the KNN relational graph nodes: the affinity Aff (m, n) of two nodes Vm and Vn represents the probability that the two nodes belong to the same class, the higher the affinity is, the higher the probability that the two nodes belong to the same class is, and the method for calculating the affinity of the nodes Vm and Vn is as follows:

node similarity a _i,j ＝<F _i ,F _j >Is the inner product of the node feature vectors, the node consistency factor e _i,j ＝P _(yi＝yj) -P _(yi≠yj) When using annotation data, when nodes belong to the same class e _i,j =1, e when nodes do not belong to the same class _i,j = -1, density of nodes defined as

A node with a node density higher than the empirical value has a high probability of belonging to the same class as an adjacent node, whereas the probability of belonging to the same class as the adjacent node is lower.

Further, edges among the nodes are cut according to the affinity, the nodes in the same cluster belong to the same type of sample, and the second KNN relation graph of the sample image to be labeled obtained by pruning specifically comprises the following steps: the loss L of the affinity model between the nodes is calculated, specifically,

Aff _i is the node affinity, aff 'calculated from the labeled graph' _i The node affinity is calculated by a GCN-V node affinity prediction model; setting a first threshold value T1 in a first KNN relation graph of the sample image to be labeled, and determining the affinity Aff _i <Pruning the edges of the T1, and putting all the nodes of the same class in the first KNN relation graphConnecting into clusters; setting a second threshold value T2 in the first KNN relational graph for pruning, wherein the second threshold value T2 is smaller than the first threshold value T1; and combining one or more connecting edges among the clusters into one, and forming a second KNN relation graph of the sample image to be labeled through pruning.

Further, inputting the second KNN relation graph of the sample image to be labeled into a pre-trained GCN-C edge connection strength prediction model, wherein the prediction of the edge connection strength between the clusters specifically comprises the following steps: calculating the loss Lc of the GCN-C edge connection strength prediction model:

where rc is the inter-node connection strength, r ', calculated from the labeled graph' _c Is the connection strength output by the GCN-C edge connection strength prediction model.

A second aspect of an embodiment of the present application provides an apparatus, including: the image acquisition module is used for acquiring a plurality of sample images to be marked, extracting the image characteristics of each sample image network characterization layer to be marked based on a CNN convolutional neural network characteristic extraction model, and constructing a first KNN relation graph of each sample image to be marked based on a K nearest neighbor algorithm according to the image characteristics; the affinity calculation module is used for inputting a first KNN relation graph of the to-be-labeled sample image with density nodes lower than an empirical value into a pre-trained GCN-V node affinity prediction model and predicting the affinity among the KNN relation graph nodes; the affinity cutting module is used for cutting edges among the nodes according to the affinity, enabling the nodes in the same cluster to belong to the same type of sample, and performing pruning to obtain a second KNN relation graph of the sample image to be marked; the edge connection strength calculation module is used for inputting the second KNN relation graph of the sample image to be marked into a pre-trained GCN-C edge connection strength prediction model and predicting the edge connection strength between clusters; and the image clustering module is used for extracting clusters with edge connection strength larger than a threshold value to form clusters of the sample images to be labeled.

A third aspect of the embodiments of the present application provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the computer device, where the processor implements the steps of the automatic image annotation method provided in the first aspect when executing the computer program.

A fourth aspect of embodiments of the present application provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the steps of the automatic image annotation method provided by the first aspect.

The implementation of the method, the device, the computer equipment and the computer-readable storage medium for automatically labeling the image provided by the embodiment of the application has the following beneficial effects:

according to the method, the automatic image labeling device, the computer equipment and the computer readable storage medium, the affinity among the nodes is defined through the structure of the relational graph and the feature similarity of a plurality of nodes, the correlation strength of two clusters is predicted through the feature information of the same type of sample (cluster) and the relation between the same type of sample and other types of samples, the image labeling accuracy is improved, and the purpose of clustering in image samples difficult to achieve is achieved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings required to be used in the embodiments or the prior art description will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings may be obtained according to these drawings without inventive labor.

Fig. 1 is a flowchart illustrating an implementation of an automatic image annotation method according to an embodiment of the present invention;

FIG. 2 is a block diagram of a flow chart of an automatic image annotation method according to the present invention;

FIG. 3 is a schematic flowchart illustrating an exemplary method for automatically labeling images according to the present invention;

FIG. 4 is a schematic diagram illustrating the calculation of node affinity according to the present invention;

FIG. 5 is a schematic diagram of an embodiment of the present invention segmenting to form clusters, and forming a second KNN relationship map from the clusters;

FIG. 6 is a schematic structural diagram of an automatic image labeling apparatus according to the present invention;

FIG. 7 is a schematic structural diagram of a computer device for executing the automatic image annotation method of the present invention.

Detailed Description

In order to make the technical problems, technical solutions and advantageous effects to be solved by the present invention more clearly apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Referring to fig. 1 to fig. 3, fig. 1 is a flowchart illustrating an implementation of an automatic image annotation method according to an embodiment of the present application. An automatic image annotation method comprises the following steps:

step S1: acquiring a plurality of sample images to be labeled, extracting the image characteristics of each sample image network characterization layer to be labeled based on a CNN convolutional neural network characteristic extraction model, and constructing a first KNN relation graph of each sample image to be labeled based on a clustering algorithm according to the image characteristics; the K nearest neighbor algorithm used in the Clustering algorithm in this embodiment may also be used in other embodiments, for example, a Density-Based Clustering-Based DBSCN (Density-Based Spatial Clustering of Applications with Noise) algorithm, which is not limited herein.

The CNN convolutional neural network feature extraction model extracts the image features of each sample image network characterization layer to be labeled, and the image features comprise a feature vector F of the sample image network characterization layer to be labeled, wherein the F belongs to R ^(N×D) Wherein N represents the number of unlabelled images, D represents the dimensionality of the feature vector, and the first KNN relational graph is the most similar K value data of each sample image to be labeled, which is obtained by using a K nearest neighbor algorithm (KNN) according to the feature vector and the vector inner product measurement.

Step S2: inputting a first KNN relation graph of the sample image to be marked, of which the density node is lower than the empirical value, into a pre-trained GCN-V node affinity prediction model, and predicting the affinity among the KNN relation graph nodes; the GCN-V node affinity prediction model is obtained by utilizing a CNN model to calculate the node affinity of an annotated image and train a graph convolution neural network;

wherein the first KNN relation G (V, E) is an undirected graph, and the nodes Vi (i ∈ [0, N)) represent image data and are characterized by feature vectors Fi; the edge Ej (i ∈ [0, K)) represents the relationship between two connected nodes, characterized by the adjacency matrix.

Further, the method of the GCN-V node affinity prediction model trained firstly specifically comprises the following steps:

step S21: inputting the marked graph into a CNN convolutional neural network feature extraction model to extract identification features;

step S22: constructing a KNN affinity relation graph of the labeled image based on a K nearest neighbor algorithm according to the identification characteristics;

step S23: and selecting a KNN affinity relation graph with density nodes lower than empirical values to train to obtain the GCN-V node affinity prediction model.

As shown in fig. 4, specifically, the affinity Aff (m, n) of two nodes Vm and Vn represents the probability that the two nodes belong to the same class, and the higher the affinity is, the higher the probability that the two nodes belong to the same class is, the method for calculating the affinity of the nodes Vm and Vn is as follows:

node similarity a _i,j ＝<F _i ,F _j >Is the inner product of the node feature vectors, the node consistency coefficient e _i,j ＝P _(yi＝yj) -P _(yi≠yj) When using tagged data, e when nodes belong to the same class _i,j =1, e when nodes do not belong to the same class _i,j =1, density of nodes is defined as

Further, in a relational graph, nodes with low density are objects that need further verification. In order not to repeat the calculation, only the connection edge composed of nodes adjacent to the verification node and having higher density is selected. According to the priori knowledge, the proportion of the low-density nodes which are further verified is set to be a, in the embodiment, the value range of the a according to the priori knowledge is (10 percent, 15 percent), and in other embodiments, the a value can be comprehensively considered according to specific actual requirements.

S3, cutting edges among the nodes according to the affinity, enabling the nodes in the same cluster to belong to the same type of sample, and pruning to obtain a second KNN relation graph of the sample image to be labeled;

as shown in fig. 5, specifically, a GCN-V model is trained, the GCN-V model is used to predict the affinity between nodes, and edges between nodes with low affinity (different types of samples) are cut off, so that nodes in the same cluster belong to the same type of sample.

The model input is a relational graph formed by nodes Vm and Vn and adjacent points of the nodes Vm and Vn, and an adjacent matrix Adj and a characteristic matrix F which represent the relational graph are input into the GCN-V model.

The model output is the affinity of the nodes Vm and Vn, and the affinity between the nodes is calculated by the label (GT) of the labeled image sample in the training stage.

The loss L of the affinity model between the nodes is calculated, specifically,

Aff _i is the node affinity, aff 'calculated from the labeled graph' _i The node affinity is calculated by a GCN-V node affinity prediction model;

setting a first threshold value T1 in a first KNN relational graph of the sample image to be labeled, pruning the edges with the affinity Affi < T1, and connecting all nodes of the same class in the first KNN relational graph to form a cluster;

setting a second threshold T2 in the first KNN relational graph for pruning, and keeping connecting edges among the clusters except for the connecting edges in the clusters, wherein the second threshold T2 is smaller than the first threshold T1;

and combining one or more connecting edges among the clusters into one, and forming a second KNN relation graph of the sample image to be labeled through pruning.

And step S4: inputting the second KNN relation graph of the sample image to be marked into a pre-trained GCN-C edge connection strength prediction model, and predicting the edge connection strength between clusters; the GCN-C edge connection strength prediction model is obtained by utilizing a CNN model to calculate a characteristic matrix of a cluster relation graph of a labeled image and train a graph convolution neural network;

specifically, the clusters are used as the node units in the second KNN relational graph to aggregate a plurality of clusters, and the problem of wrong clustering caused by outlier samples is solved. The feature H of a cluster is a combination of peak and mean features, H = [ Fpeak, fmean =]，H∈R ^(1×2D) Wherein, the peak characteristic Fpeak is the characteristic of the node with the highest density; the mean feature Fmean is the feature mean of each node in the cluster.

And training a GCN-C model to predict the edge connection strength between clusters, inputting the model into a relation subgraph consisting of nodes Vm and Vn in a second KNN relation graph and adjacent points of the nodes Vm and Vn, inputting an adjacent matrix Adj and a characteristic matrix H representing the relation graph into the GCN-C model, outputting the model into the connection relation of each node of the relation subgraph, and labeling labels of image samples in a training stage.

Calculating the loss Lc of the GCN-C edge connection strength prediction model:

Step S5: and extracting the cluster with the edge connection strength larger than the threshold value to form the cluster of the sample image to be labeled.

Specifically, a threshold value T3 is set, clusters with connectivity rc being larger than the threshold value T3 are combined into a class, a new cluster is formed, rc is an empirical value and is obtained through statistics of a training data set, and a labeled image library is generated according to a clustering result and used for model training.

Corresponding to the above method embodiment, an embodiment of the present invention further provides an image apparatus, as shown in fig. 3, the apparatus may include the following modules:

the image acquisition module is used for acquiring a plurality of sample images to be marked, extracting the image characteristics of each sample image network characterization layer to be marked based on a CNN convolutional neural network characteristic extraction model, and constructing a first KNN relation graph of each sample image to be marked based on a K nearest neighbor algorithm according to the image characteristics;

the affinity calculation module is used for inputting a first KNN relation graph of the sample image to be labeled, of which the density node is lower than the empirical value, into a pre-trained GCN-V node affinity prediction model to predict the affinity among the KNN relation graph nodes;

the affinity cutting module is used for cutting edges among the nodes according to the affinity, enabling the nodes in the same cluster to belong to the same type of sample, and performing pruning to obtain a second KNN relation graph of the sample image to be marked;

the edge connection strength calculation module is used for inputting the second KNN relation graph of the sample image to be marked into a pre-trained GCN-C edge connection strength prediction model and predicting the edge connection strength between clusters;

and the image clustering module is used for extracting clusters with edge connection strength larger than a threshold value to form clusters of the sample images to be labeled.

It should be understood that, in the structural block diagram of the apparatus shown in fig. 6, each module is used to execute each step in the embodiment corresponding to fig. 1, and each step in the embodiment corresponding to fig. 1 has been explained in detail in the foregoing embodiment, and specific reference is made to the relevant description in the embodiment corresponding to fig. 1, which is not repeated herein.

Fig. 4 is a block diagram of a computer device according to an embodiment of the present application. The computer device of this embodiment includes: a processor, a memory and a computer program stored in the memory and executable on the processor, such as a program of an automatic image annotation method. And when the processor executes the computer program, the steps of the automatic image annotation method in each embodiment are realized. Alternatively, the processor implements the functions of the modules in the embodiment corresponding to fig. 6 when executing the computer program, specifically please refer to the related description in the embodiment corresponding to fig. 6, which is not described herein again.

Illustratively, the computer program may be partitioned into one or more modules that are stored in the memory and executed by the processor to accomplish the present application. The one or more modules may be a series of computer program instruction segments capable of performing certain functions, the instruction segments describing the execution of the computer program in the computer device.

The turntable device may include, but is not limited to, a processor, a memory. Those skilled in the art will appreciate that fig. 4 is merely an example of a computer device and is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or different components, e.g., the turntable device may also include input output devices, network access devices, buses, etc.

The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable Gate array (fpga) or other programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory may be an internal storage unit of the computer device, such as a hard disk or a memory of the computer device. The memory may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, provided on the computer device. Further, the memory may also include both an internal storage unit and an external storage device of the computer device. The memory is used for storing the computer program and other programs and data required by the turntable device. The memory may also be used to temporarily store data that has been output or is to be output.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, and the computer program is executed by a processor to implement the automatic image annotation method in the above embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the computer program is executed. Any reference to memory, storage, database, or other computer-readable storage medium used in the embodiments provided herein can include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), rambus (Rambus) direct RAM (RDRAM), direct Rambus Dynamic RAM (DRDRAM), and Rambus Dynamic RAM (RDRAM), among others.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. An automatic image annotation method is characterized by comprising the following steps:

acquiring a plurality of sample images to be marked, extracting the image characteristics of each sample image network characterization layer to be marked based on a CNN convolutional neural network characteristic extraction model, and constructing a first KNN relation graph of each sample image to be marked based on a clustering algorithm according to the image characteristics;

inputting a first KNN relational graph of the sample image to be marked, of which the density node is lower than an empirical value, into a pre-trained GCN-V node affinity prediction model, and predicting the affinity among the nodes of the KNN relational graph;

cutting edges among the nodes according to the affinity, enabling the nodes in the same cluster to belong to the same type of sample, and pruning to obtain a second KNN relation graph of the sample image to be labeled;

inputting the second KNN relation graph of the sample image to be marked into a pre-trained GCN-C edge connection strength prediction model, and predicting the edge connection strength between clusters;

and extracting the cluster with the edge connection strength larger than the threshold value to form the cluster of the sample image to be labeled.

2. The method for automatically labeling images according to claim 1, wherein the GCN-V node affinity prediction model is obtained by training a graph convolution neural network by calculating the node affinity of a labeled image by using a CNN model.

3. The method for automatically labeling images according to claim 1, wherein the GCN-C edge connection strength prediction model is obtained by training a graph convolution neural network by using a CNN model to calculate a feature matrix of a cluster relation graph of a labeled image.

4. The method for automatically labeling images as claimed in claim 1, wherein the CNN convolutional neural network feature extraction model extracts the image features of each sample image network characterization layer to be labeled, and the image features comprise a feature vector F of the sample image network characterization layer to be labeled, wherein F belongs to R ^(N×D) Wherein N represents an unlabeled figureThe image quantity D represents the dimensionality of the feature vector, and the first KNN relational graph is K value data of each sample image to be labeled, which is obtained by using a K nearest neighbor algorithm (KNN) according to the feature vector and the vector inner product measurement.

5. The automatic image annotation method according to claim 1, characterized in that said first KNN relation G (V, E) is an undirected graph, nodes Vi (i E [0, N)) represent image data, characterized by feature vectors Fi; the edge Ej (i ∈ [0, K)) represents the relationship between two connected nodes, characterized by the adjacency matrix.

6. The automatic image annotation method according to claim 1, wherein the pre-trained GCN-V node affinity prediction model is specifically:

inputting the marked graph into a CNN (convolutional neural network) feature extraction model to extract identification features;

constructing a KNN affinity relation graph of the labeled image based on a K nearest neighbor algorithm according to the identification characteristics;

and selecting a KNN affinity relation graph with the density nodes lower than the empirical value to train and obtain the GCN-V node affinity prediction model.

7. The automatic image annotation method according to claim 4, wherein the pre-trained GCN-C edge connection strength prediction model is specifically:

setting a first threshold value based on the KNN affinity relation graph to form a node cluster;

setting a second threshold value to construct a new cluster relation graph;

and calculating a characteristic matrix of the cluster relation graph, and training to obtain a GCN-C edge connection strength prediction model.

8. An automatic image labeling device, comprising:

and the image clustering module is used for extracting clusters with the edge connection strength larger than a threshold value to form clusters of the sample images to be marked.

9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.