CN114299342B - Unknown mark classification method in multi-mark picture classification based on deep learning - Google Patents

Unknown mark classification method in multi-mark picture classification based on deep learning Download PDF

Info

Publication number
CN114299342B
CN114299342B CN202111655752.8A CN202111655752A CN114299342B CN 114299342 B CN114299342 B CN 114299342B CN 202111655752 A CN202111655752 A CN 202111655752A CN 114299342 B CN114299342 B CN 114299342B
Authority
CN
China
Prior art keywords
mark
unknown
matrix
marks
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111655752.8A
Other languages
Chinese (zh)
Other versions
CN114299342A (en
Inventor
黄�俊
严煜
郑啸
陶陶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui University of Technology AHUT
Institute of Artificial Intelligence of Hefei Comprehensive National Science Center
Original Assignee
Anhui University of Technology AHUT
Institute of Artificial Intelligence of Hefei Comprehensive National Science Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui University of Technology AHUT, Institute of Artificial Intelligence of Hefei Comprehensive National Science Center filed Critical Anhui University of Technology AHUT
Priority to CN202111655752.8A priority Critical patent/CN114299342B/en
Publication of CN114299342A publication Critical patent/CN114299342A/en
Application granted granted Critical
Publication of CN114299342B publication Critical patent/CN114299342B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The invention provides a method for classifying unknown marks in multi-mark picture classification based on deep learning, and belongs to the field of machine learning. According to the invention, on the premise of fully learning the characteristic representation of the known mark, the characteristic extracted by CNNs is subjected to non-negative matrix factorization by a non-negative matrix factorization technology, unknown mark information is mined, and consistency constraint is carried out with a classifier, so that the characteristic representation of the unknown mark is learned. Meanwhile, a more accurate marking relation matrix is constructed according to an approximate solution of a complete marking matrix obtained by nonnegative matrix decomposition, and then a graph neural network is utilized to learn marking relation, so that unknown marking characteristic representation with semantic information is obtained. According to the method, valuable implicit information in data is mined through finding the unknown marks, the known mark classification and the unknown mark classification are mutually guided and promoted by utilizing the relevance between the known marks and the unknown marks, the classification performance of the known marks and the unknown marks is finally improved, and the multi-mark learning task is better carried out.

Description

Unknown mark classification method in multi-mark picture classification based on deep learning
Technical Field
The invention relates to a multi-mark learning technology in the field of machine learning, in particular to a method for solving the problems of unknown mark discovery and classification in deep multi-mark learning.
Background
In the big data age, the multi-mark image is more and more complex, the complexity of the multi-mark image is not only reflected in the increase of the number of marks in the image, but also reflected in the distribution of different marks in the multi-mark image. In order to solve the problem of classifying the multi-label image, the label relationship can be modeled by combining the interrelationships existing among labels in multi-label learning besides the label classification by utilizing the characteristics of the image, such as outline, shape, color and the like.
The current multi-label learning approach basically assumes that the labels of the training dataset are all observed and known. However, in some practical applications, this assumption is not always true, and the user only observes a part of the markers, i.e. there are unknown markers in the data, whose semantic information and sample labeling result are completely unknown. In the training stage, the user only observes q marks, r unknown marks exist, and the labeling information of the training sample on the r unknown marks is completely absent.
For reasons that lead to the presence of unknown markers in the multi-marker dataset, the following three main reasons are analyzed and summarized:
(1) Sample marking cost is high: under the big data environment, the data volume is big, and the semanteme of data is more complicated, and the semanteme mark number of whole dataset is more, and the mark process is difficult, consumes a large amount of manpower and materials, especially to those datasets that the data volume is great. It is therefore difficult to provide a complete set of marks, which inevitably results in some useful marks being missed. Furthermore, due to the large amount of data, a user may label only a small portion of the samples, while some unknown labels that do not belong to the set of known labels may be present in a large number of unlabeled samples.
(2) The partial mark is manually ignored: in some implementations, one may only focus on part of the markers, while others are directly considered to be omitted. For example, in an image classification task, the user may only care about which objects of interest are contained in the picture, and other objects of no interest and background parts are ignored. Research has found that by modeling these ignored markers, such as background or other objects, the accuracy of the classification of the model for the primary object of interest to the user can be improved.
(3) Human cognition and data acquisition conditions are limited: due to limitations in human cognition, there may be some new markers beyond cognition that have not been found. For example, in disease diagnosis tasks, there are diseases that exist objectively but are not perceived by humans. Furthermore, certain diseases are not found in time due to the lack of corresponding medical detection devices and methods. Thus, these disease categories are not labeled into the specific case samples. For example, in the initial stages of new coronaries, patients may be treated as influenza if not subjected to nucleic acid detection.
According to research, valuable implicit information in data can be mined through finding unknown marks, the known mark classification and the unknown mark classification are mutually guided and promoted by utilizing the relevance between the known marks and the unknown marks, the classification performance of the known marks and the unknown marks is finally improved, and a multi-mark learning task is better carried out.
Therefore, the problem of unknown marks in data is widely focused by researchers, and the problem is likely to become a new research hotspot and development trend in the fields of machine learning, data mining, pattern recognition and the like. However, in practical application, information of unknown marks is mined, so that a mark set tends to be complete, and no very effective scheme exists at present.
Disclosure of Invention
1. Technical problem to be solved by the invention
The conventional multi-label learning method assumes that the number of labels in a data set is fixed, but in actual situations, unknown labels exist in the multi-label data set due to the problems of high sample labeling cost, limitation of human cognition and the like. Based on the method, the invention provides an unknown mark classification method in multi-mark picture classification based on deep learning; the deep neural network is superior to the traditional method in the capability of extracting features, and in the multi-label picture classification based on deep learning, a multi-label classification model of known labels and unknown labels is constructed, so that the multi-label classification is more accurate.
2. Technical proposal
In order to achieve the above purpose, the technical scheme provided by the invention is as follows:
the invention discloses a method for classifying unknown marks in multi-mark picture classification based on deep learning, which comprises the following steps:
s1, reading an image file, a known mark matrix and a known mark relation diagram, and initializing node information of unknown marks in the relation diagram;
S2, extracting semantic representations of all marks from the mark relation graph, and clustering the semantic representations of the marks to obtain semantic representations of the mark group;
s3, constructing a deep convolutional neural network, and extracting feature graphs of different convolutional layers in the network; based on the attention mechanism, fusing the feature map with semantic representations of all marks and mark groups to obtain final feature representations of the image, namely generic feature representations and mark group feature representations;
s4, performing non-negative matrix factorization on the feature map to obtain an approximate solution of a complete mark matrix, constructing an initial mark relation matrix by using the approximate solution of the complete mark matrix, and then learning the mark relation by using a map neural network;
S5, after splicing the generic feature representations related to the marks and the mark group feature representations to which the generic feature representations belong, inputting the generic feature representations into a classifier to obtain the confidence coefficient of each mark in the image;
S6, the part of the classifier, which predicts the known mark, is supervised by using the real information of the known mark, and the S1, the S2, the S3 and the S5 are repeated, and the feature representation is fully learned by using the known mark; after the model converges, the classifier predicts the unknown mark part in the complete mark matrix approximation solution and makes consistency constraint on the unknown mark part, and S1-S5 is repeated to perfect the characteristic representation of the unknown mark;
s7, giving a test sample, and taking the test sample into a final classification model obtained by S1-S6 learning to obtain predicted values of the test sample on the known mark and the unknown mark;
S8, checking the corresponding object according to the attention diagram of the unknown mark, and acquiring the semantic name of the unknown mark.
3. Advantageous effects
Compared with the prior public technology, the technical scheme provided by the invention has the following remarkable effects:
(1) Most conventional multi-label learning methods assume that the labels of the training dataset are all observed and known. However, in some practical applications, this assumption is not always true, and the user only observes a part of the markers, i.e. there are unknown markers in the data, whose semantic information and sample labeling result are completely unknown. Aiming at the condition that an unknown mark exists, the method uses a non-negative matrix factorization technology to mine the information of the unknown mark, so that the mark set tends to be complete, and the classification accuracy is improved.
(2) Deep neural networks have greater advantages in extracting higher order features. The existing discovery and classification methods for unknown marks can only use shallow models, and the invention uses depth models to better acquire the characteristic representation of the marks. Meanwhile, by using the graph neural network, the co-occurrence relationship among the marks can be perfected after the unknown mark relationship is mined. The invention uses the attention mechanism to view the attention force diagram of the unknown mark, and more conveniently view the image of the unknown mark in the attention area.
Drawings
FIG. 1 is a diagram of an unknown marker classification model framework of the present invention.
Detailed Description
For a further understanding of the present invention, the present invention will be described in detail with reference to the drawings and examples.
Example 1
Referring to fig. 1, the method for classifying unknown marks in multi-mark image classification based on deep learning in this embodiment includes three stages of model construction and training, mark prediction and checking of objects corresponding to unknown marks, and specifically includes the following steps:
(1) Model construction and training:
S1, reading an image file, a known mark matrix and a known mark relation diagram. And initializing node information of unknown marks in the relation graph. The method comprises the following steps:
the preprocessed picture of the data set is input as Wherein n represents the number of samples, c represents the number of channels, w and h represent the width and height of the picture. Wherein c= 3,w, h=448; the known label is Y ε {0,1} n×q, where q represents the number of known labels. Y ij =1 means that the ith sample belongs to the jth marker, otherwise Y ij =0, 1.ltoreq.i.ltoreq.n, 1.ltoreq.j.ltoreq.q.
The known mark relationship graph g= { D, S }, d= { D 1,...Dq } is a point set in the relationship graph, D i represents a point corresponding to the i-th known mark in the relationship graph, and the feature of the point is a semantic description of the known mark.Semantic description matrix representing known tags,/>Is the real number domain, d=300, v i is the semantic description of the i-th known label, 1.ltoreq.i.ltoreq.q. The known mark semantic description is obtained by adopting Glove networks, word vectors are extracted from wikipedia corpus, and the word vectors are selected as the mark semantic description. S is an edge set in the relation graph, the weight of an edge is the co-occurrence probability of a known mark, A epsilon [0,1] q×q represents a co-occurrence probability matrix among the known marks, A ij represents the probability of occurrence of mark j when mark i exists, and the definition formula of A ij is as follows:
In formula (1), L ij represents the number of times that the mark i and the mark j coexist, and N i represents the number of times that the mark i appears in the entire training set. To avoid some rare co-occurrence relationships as noise, a threshold τ is set to filter out the lower value co-occurrence probabilities:
In formula (2), τ is a hyper-parameter, τ ε [0,1]. The binary matrix A can cause an overcomplete phenomenon, so that the characteristics of different points are difficult to distinguish after the relation graph is calculated by the graph neural network, and the co-occurrence probability of the point and all neighbor nodes is introduced when the weight of the relation graph edge is represented:
in the formula (3), t is a super parameter, and t is [0,1]. When the graph neural network is used for updating the relation graph, the edge weights processed by the formulas (2) - (3) can reduce the influence of noise and improve the distinguishing degree of each point.
Setting r unknown marks, and setting a complete mark matrix asY ε {0,1} n×q is a known mark matrix,/>For an unknown marker matrix, l=q+r represents the number of complete markers. /(I)Indicating that the ith sample belongs to the jth unknown token, otherwise/>q+1≤j≤l。
The complete mark relation diagram isWherein/>Representing the corresponding point of the kth unknown token in the relationship graph, the features of the point are described using the semantics of the complete token. /(I)Semantic description matrix representing complete labels,/>Is the semantic description of the ith tag. /(I)Is the edge set of the complete mark relation graph, and the weight uses the complete mark co-occurrence probability matrix/>To represent.
Complete unknown mark semantic description matrix isWherein/>For the semantic description matrix of unknown labels, the features of unknown label points use the semantic description/>, of unknown label kExpressed, q+1.ltoreq.k.ltoreq.l.
Since no information is provided for the unknown token, the present invention requires initializing semantic descriptions of the unknown token, co-occurrence probabilities of the unknown token. Considering that there is a certain correlation between the known mark and the unknown mark, the semantic description of the known mark is used to initialize the semantic description of the unknown mark, the process is:
In the formula (5), the amino acid sequence of the compound, And/>V and/>, respectivelyRow i, k vectors in (a).
To obtain a complete marker co-occurrence probability matrixThe number of co-occurrences of unknown markers, L ki and L gk, number of markers, N k, need to be initialized. Initializing a marker relationship of the unknown marker according to the marker relationship of the known marker:
Lki=max(Lki,0) (8)
Nk=max(Nk,0) (9)
In the formulas (6) - (7), i is more than or equal to 1 and less than or equal to q, and q+1 is more than or equal to k and less than or equal to l. randint (a, b) functions represent randomly generated integers greater than or equal to a and less than or equal to b. The term represents the average number of coexistence times of all known tags with tag i. The meaning of the formulas (6) to (7) is that the average number of coexistence times of all known marks and mark i is obtained, and the number of coexistence times is randomly increased or decreased in a small range, and then the number of coexistence times of unknown mark k and mark i is obtained. In the formulas (8) to (9), max (a, b) represents a function of judging the maximum value, and the return value is the maximum value among a and b.
After all initialization of L ki, initializing L ik, wherein the process is as follows:
Lgk=max(Lgk,0)
In the formula (10), g is more than or equal to 1 and less than or equal to l, and q+1 is more than or equal to k and less than or equal to l.
S2, extracting semantic representations of all marks from the mark relation graph, and clustering the semantic representations of the marks to obtain semantic representations of the mark group. The method comprises the following steps:
the present invention uses a graph annotation force network (GAT) from a complete graph of labeled relationships Extracting semantic representations of the tokens. Graph attention network calculates a relationship graph/>, using an attention mechanismThe importance of the marker node j to marker node i, namely the attention coefficient α ij:
in the formula (11), the amino acid sequence of the compound, Is/>I-th line vector in >/>And/>Is a matrix of coefficients that can be learned, N i represents an undirected graph/>The neighbor node of the middle marker node i. I represents the concatenation operation of vectors,/> Leakyrelu (·) is a nonlinear activation function, exp (x) represents operation e x. I is more than or equal to 1 and less than or equal to l, j is more than or equal to 1 and less than or equal to l.
And linearly combining the attention coefficient with the feature vector of the relation graph node to obtain a new feature of the relation graph node, namely, a marked semantic representation E l:
in the formula (12), the amino acid sequence of the compound, Is the i-th row vector of E l, representing the semantic representation of the i-th marker. σ (·) is a nonlinear activation function elu, N i represents a neighbor node of point i in the relationship graph, 1+.i+.l, d' =512. Then t times of iteration are carried out to obtain/>Each iteration the process is expressed as:
in the formula (13), t is more than or equal to 0. Initial value For/>The semantic representation of the marks can show the co-occurrence relationship among the marks, and can show the semantic relationship of the marks.
The invention uses a differentiable graph pooling network (Diffpool) to cluster semantic representations of the marker sets from the semantic representations of the markers E g, the clustering process being represented as:
in the formula (14), the amino acid sequence of the compound,
In order to make clustered clusters more compact, a loss function is set that minimizes the distance of two semantic representations, the loss function being:
In the formula (15), N g represents the number of set marker groups, and C k represents the semantic representation set of the kth marker group. Is the kth line vector of E g, representing the semantic representation of the kth marker set.
S3, constructing a deep convolutional neural network, and extracting feature maps of different convolutional layers in the network. Based on the attention mechanism, the feature map is fused with semantic representations of all marks and mark groups to obtain final feature representations of the image, namely, generic feature representations and mark group feature representations. The method comprises the following steps:
The convolutional neural network is the most commonly used characteristic extraction mode in the deep learning at present, and ResNet-101 convolutional neural networks are used in the invention. Matrix an image And inputting into a convolutional neural network to extract a feature map.
ResNet-101 is composed of 4 residual blocks, the resolution of the extracted feature map of each residual block is different, and the feature map of high resolution is opposite because the feature map of low resolution has more expressive power on small objects. In order to fully utilize the expression capability of feature graphs with different resolutions, the feature graphs extracted by the final convolution layer of three residual blocks after ResNet-101 are taken B=3, c s e {512,1024,2048}. By mapping matrix/>F s is subjected to dimension reduction, and the/>, after dimension reductionD' is the length of the single row vector in the semantic representation E l、Eg of the tag, tag group.
The channel corresponding to each pixel point (w, h) of the multi-layer image feature map is respectively expressed with the ith mark semantic meaningJth tag group semantic representation/>Multiplying the corresponding elements to obtain a semantic guided attention coefficient matrix/>And marker set semantic representation guided attention coefficient matrix/>
In the formulae (16) to (17),A vector representing the element composition of all channels of the s-th feature map at a pixel point (w, h), and a vector representing the multiplication of elements at corresponding positions of two matrices,/>1≤i≤l,1≤j≤Ng
In the formulae (18) to (19),
Again, the attention coefficients are multiplied by the image feature map by the corresponding elements to generate an attention map:
attention in formulas (20) - (21) s={1,2,3}。
Finally, generic feature representations are obtainedAnd tag group feature representation/> B=3。
It should be noted that, the current deep learning network feature extraction is generally AlexNet, VGG, resNet, wherein ResNet occupies the dominant position in the current deep learning framework due to its strong feature extraction capability, and in many versions of ResNet, performance and computing resources are weighed, and factors such as computing speed mainly adopt ResNet-101 of 101 layers. The current mainstream method adopts the characteristics of a layer at the end of ResNet-101, and does not efficiently utilize different characteristic layers to acquire more abundant semantic information. Although ResNet-101 is adopted as the image general feature extractor, the invention can be combined with any mainstream convolutional neural network in practical application and has good expansibility.
S4, performing non-negative matrix factorization on the feature map to obtain an approximate solution of the complete mark matrix, constructing an initial mark relation matrix by using the approximate solution of the complete mark matrix, and then learning the mark relation by using the map neural network. The method comprises the following steps:
extracting the last layer of feature map from ResNet-101 Wherein c 3 ε {2048}. And taking the layer of feature graphs of all training samples, and carrying out global average pooling operation on 2 and 3 dimensions to obtain a feature matrix X of all training samples. The 2 nd dimension of the feature matrix X is obtained after the PCA is used for reducing the dimensionWhere c' e {256}.
Decomposing the feature matrix X into an approximate solution W of a complete mark matrix and a coefficient matrix H by using non-negative matrix decomposition, wherein the loss function is as follows:
In the formula (22), the amino acid sequence of the compound, Is a projection matrix formed by the front q columns of the l×l identity matrix, q is the total number of marks, lambda 1、λ2、λ3 is the first constraint in the above formula of the non-negative weight coefficient ,λ1、λ2∈{100,101,102,103,104},λ3∈{10-6,10-5,10-4,10-3}., X is decomposed into a base matrix W and a coefficient matrix H, and a negative element in W is set to 0 during updating to ensure the non-negativity of W, H. The second constraint keeps the first q columns in the base matrix W consistent with the known marker matrix Y, and since the base matrix W is an abstract feature of X after dimension reduction, all l columns of W approximate the complete marker matrix, and the last r columns approximate the solution of the unknown marker matrix.
The threshold t 1 is selected, the values greater than t 1 in the latter r columns of W (:,q+1:l) are considered positive class labels, and the remaining values are considered negative class labels. Counting the co-occurrence times L ki、Lgk of the unknown mark k and the mark i and the mark number N k of the unknown mark k, and recalculating the co-occurrence probability matrix of the complete mark1≤i≤q,q+1≤k≤l,1≤g≤l。
S5, after the generic feature representations related to the marks and the mark group feature representations to which the generic feature representations belong are spliced, inputting the generic feature representations into a classifier to obtain the confidence coefficient of each mark in the image. The method comprises the following steps:
Splicing the generic feature representation K i corresponding to the mark i and the K mark group feature representation L k to which the mark i belongs as the ith input of the classifier:
Qi=Ki||Lk (23)
In the formula (23), the amino acid sequence of the compound, Is the ith row vector of the generic feature representation matrix,/>Is the kth row vector of the tag group feature representation matrix,/>1≤i≤l,1≤k≤Ng
The classifier's prediction of the confidence that a token i exists is expressed as:
in the formula (24), the amino acid sequence of the compound, Is a parameter matrix which can be learned. sum (x) represents summing all elements in x. I is more than or equal to 1 and less than or equal to l.
S6, the part of the classifier, which predicts the known mark, is supervised by using the real information of the known mark, and the S1, the S2, the S3 and the S5 are repeated, and the feature representation is fully learned by using the known mark; after the model converges, the part of the unknown mark predicted by the classifier uses the unknown mark part in the complete mark matrix approximate solution to make consistency constraint, and S1-S5 is repeated to perfect the characteristic representation of the unknown mark. The method comprises the following steps:
Giving a sample t, taking the sample t into a model to obtain an output value The first q values of (1) are supervised by using the real information of the known mark, S1, S2, S3 and S5 are repeated, the image features are fully learned by using the supervision information of the known mark, and the total loss function is as follows:
in the formula (25), the amino acid sequence of the amino acid, For the sample the real information of the i-th known marker,/>Representing the confidence prediction value of the classifier for the occurrence of the ith marker. After the model converges, initializing an approximate solution W of the complete mark matrix by using non-negative matrix factorization, repeating S4, wherein the total loss function is as follows:
After W is obtained, using the following r columns in W, namely, the approximate solution of the unknown mark matrix and the part of the classifier for predicting the unknown mark to make consistency constraint, repeating S1-S5, wherein the total loss function is as follows:
in equation (26), u q+i represents the approximation of the corresponding ith marker of the sample in the complete marker matrix approximation solution W.
(2) Marking and predicting:
And S7, giving a test sample, and taking the test sample into the final classification model obtained through the learning in the steps S1-S6 to obtain the predicted value of the test sample on the known mark and the unknown mark. The method comprises the following steps:
Giving a test sample t, taking the test sample t into a final classification model obtained through the learning of the steps S1-S6, obtaining a predicted value of test data t, wherein the parameter learned by the model is pi
By setting a threshold value z, the final output label vector y t∈{0,1}1×l of the test sample on q known labels and r unknown labels is calculated according to the obtained predicted value of the test sample t and the set threshold value z, wherein [. Cndot ] is an indication function. According to the calculation result of the formula (27), 1 is returned when the condition in brackets is satisfied, indicating that the test sample belongs to the i-th class. Otherwise, return 0, indicating not belonging to the i-th class. The first q element values of the vector indicate whether the test sample belongs to the known q tokens and the last r elements indicate whether the test sample belongs to the newly discovered k tokens.
yt(i)=[f(t,π)i>τ] (27)
In the formula (27), i is more than or equal to 1 and less than or equal to l.
(3) Viewing the unknown mark corresponding object:
s8, looking up the corresponding object according to the attention of the unknown mark. The method comprises the following steps:
The I-th image after pretreatment is input into a matrix I i,Ii∈Rc×w×h, I i is input into a model, and a attention coefficient matrix of an unknown mark k on different position feature graphs is taken out And/>S.epsilon.1, 2, 3. And (3) adjusting the attention coefficient matrix and the I i to a uniform size, respectively superposing and outputting to obtain three pictures with the attention area. According to the focused position of the attention area, an object corresponding to the unknown mark can be searched out, and the semantic name of the unknown mark is obtained.
In the deep learning framework, the invention fuses the technology of mining unknown marks. The learning of the known mark can be used for helping to mine the unknown mark, valuable information in the data is mined by finding the unknown mark, and meanwhile, the accuracy of the known mark can be improved by using the relevance between the unknown mark and the known mark. The existing discovery and classification methods for unknown marks can only use shallow models, and the invention uses depth models to better acquire the characteristic representation of the marks. Meanwhile, due to the use of the graph neural network, the co-occurrence relation among the marks can be perfected after the unknown marks are mined. The invention uses the attention mechanism to view the attention force diagram of the unknown mark, and more conveniently view the image of the unknown mark in the attention area.
The invention and its embodiments have been described above by way of illustration and not limitation, and the invention is illustrated in the accompanying drawings and described in the drawings in which the actual structure is not limited thereto. Therefore, if one of ordinary skill in the art is informed by this disclosure, the structural mode and the embodiments similar to the technical scheme are not creatively designed without departing from the gist of the present invention.

Claims (9)

1. The unknown mark classification method in multi-mark picture classification based on deep learning is characterized by comprising the following steps:
s1, reading an image file, a known mark matrix and a known mark relation diagram, and initializing node information of unknown marks in the relation diagram;
S2, extracting semantic representations of all marks from the mark relation graph, and clustering the semantic representations of the marks to obtain semantic representations of the mark group;
s3, constructing a deep convolutional neural network, and extracting feature graphs of different convolutional layers in the network; based on the attention mechanism, fusing the feature map with semantic representations of all marks and mark groups to obtain final feature representations of the image, namely generic feature representations and mark group feature representations;
s4, performing non-negative matrix factorization on the feature map to obtain an approximate solution of a complete mark matrix, constructing an initial mark relation matrix by using the approximate solution of the complete mark matrix, and then learning the mark relation by using a map neural network;
S5, after splicing the generic feature representation and the feature representation of the label group to which the generic feature representation belongs, inputting the generic feature representation into a classifier to obtain the confidence coefficient of each label in the image;
S6, the part of the classifier, which predicts the known mark, is supervised by using the real information of the known mark, and the S1, the S2, the S3 and the S5 are repeated, and the feature representation is fully learned by using the known mark; after the model converges, the classifier predicts the unknown mark part in the complete mark matrix approximation solution and makes consistency constraint on the unknown mark part, and S1-S5 is repeated to perfect the characteristic representation of the unknown mark;
s7, giving a test sample, and taking the test sample into a final classification model obtained by S1-S6 learning to obtain predicted values of the test sample on the known mark and the unknown mark;
S8, checking the corresponding object according to the attention diagram of the unknown mark, and acquiring the semantic name of the unknown mark.
2. The method for classifying unknown marks in multi-mark picture classification based on deep learning as claimed in claim 1, wherein the method comprises the following steps: in step S1, the picture input of the dataset isWherein n represents the number of samples, c, w and h represent the number of picture channels, width and height,/>Representing the real number domain; the known mark matrix Y epsilon {0,1} n×q, where q represents the number of known marks;
The known marker relationship graph g= { D, S }, D represents a set of marker points in the relationship graph, the features of the points are semantic descriptions of the known markers, A semantic description matrix representing a known token, d=300; s is an edge set in the relation graph, the weight of the edge is the co-occurrence probability among known marks, and A epsilon [0,1] q×q represents the co-occurrence probability matrix among the known marks;
setting r unknown marks, and completely marking as For an unknown mark matrix, l=q+r is the number of complete marks; the complete label relationship is/> Representing a set of marker points in a complete marker relationship graph, the points being characterized by semantic descriptions/>, between complete markers Semantic descriptions representing unknown tokens; the weighting of edges uses co-occurrence probability matrix/>, between the complete labelsA representation;
Initializing semantic descriptions of unknown tokens using semantic descriptions V i of known tokens The process is as follows:
In the formula (1), the components are as follows, Is the i-th line vector in V,/>For/>In the k-th row vector, q+1 is not less than k and not more than l;
To complement co-occurrence probability matrices between complete markers It is necessary to initialize the co-occurrence relationship of unknown marks and the number of marks; l ki represents the number of co-occurrences of mark k and mark i, N k represents the number of marks k; the number of co-occurrences of the unknown mark and the number of unknown marks are initialized using the number of co-occurrences of the known mark and the number of known marks, by:
Lki=max(Lki,0) (3)
Lgk=max(Lgk,0) (5)
Nk=max(Nk,0) (7)
In the formula (2) -formula (7), i is more than or equal to 1 and less than or equal to q, g is more than or equal to 1 and less than or equal to l, and q+1 is more than or equal to k and less than or equal to l; randint (a, b) functions represent randomly generated integers greater than or equal to a and less than or equal to b.
3. The method for classifying unknown marks in multi-mark picture classification based on deep learning as claimed in claim 2, wherein the method comprises the following steps: in step S2, the force network is annotated with the graph from the complete labeled relationship graphExtracting semantic representations of the markers; setting a learnable parameter matrix/>Calculate the relation diagram/>The importance of the middle marker node j to marker node i, namely the attention coefficient/>Linearly combining the attention coefficient alpha ij with the feature vector of the relation graph node to obtain a new feature of the relation graph node, namely a marked semantic representation E l, wherein the process is as follows:
In the formula (8), the amino acid sequence of the compound, Representation/>I represents the concatenation operation of vectors, leakyrelu (·) is a nonlinear activation function, exp (x) represents operation e x;
in formula (9), σ (·) is a nonlinear activation function elu, N i represents a neighbor node of point i in the graph; Is the i-th line vector of E l,/>
Deriving semantic representations of marker sets from clusters of semantic representations of markers using a differentiable graph pooling networkIn order to make clustered clusters more compact, a loss function is set that minimizes the distance of two semantic representations, the loss function being:
In the formula (10), N g represents the number of set marker groups, C k represents the semantic representation set of the kth marker group, The kth line vector of E g is semantically represented for the tag group.
4. A method for classifying unknown markers in a deep learning-based multi-marker picture classification as claimed in claim 3, wherein: in step S3, in order to fully utilize the expression capability of feature graphs with different resolutions, feature graphs extracted from the final convolution layer of the last 3 residual blocks after ResNet-101 are taken and marked asWherein/>B=3, c s e {512,1024,2048}; by mapping matrix/>F s is subjected to dimension reduction, and the/>, after dimension reduction
Multiplying the elements on the corresponding channels of each pixel point (w, h) of the multi-layer image feature map with the semantic representations of the marks and the mark groups according to the corresponding positions to obtain a attention coefficient matrix guided by the mark semantic representationsAnd marker set semantic representation guided attention coefficient matrix/>
In the formulae (11) to (12),A vector representing the elemental composition of all channels of the s-th feature map at the pixel point (w, h); as indicated by the multiplication of the elements in the corresponding positions of the two matrices,/>
In the formulae (13) to (14),
The attention coefficient is multiplied by the image feature map according to the corresponding element again to obtain an attention map:
In the formulas (15) - (16), attention is paid to Finally, generic feature representations are obtainedAnd tag group feature representation/>B=3。
5. The method for classifying unknown marks in multi-mark picture classification based on deep learning as claimed in claim 4, wherein: in step S4, the feature map is extracted from ResNet-101C 3 = 2048; taking the layer of feature graphs of all training samples, carrying out global average pooling on 2 and 3 dimensions to obtain a feature matrix X of all training samples, and carrying out PCA dimension reduction on the 2 nd dimension of the feature matrix X to obtain a feature matrix/>c’=256;
Setting a mapping matrixThe first q columns of the unit matrix with the value of l multiplied by l are decomposed into a base matrix W and a coefficient matrix H by using non-negative matrix decomposition, and the characteristic matrix X is divided into a first q columns of the unit matrix with the value of l multiplied by lThe constraint sign approximates the resulting consistency of the known sign portion of the solution W with the known sign matrix Y, the loss function being:
In the formula (17), the amino acid sequence of the compound, Lambda 1、λ2、λ3 is a non-negative weight coefficient ,λ1、λ2∈{100,101,102,103,104},λ3∈{10-6,10-5,10-4,10-3};
Selecting a threshold t 1, regarding the value greater than t 1 in W (:,q+1:l) as a positive type mark, and the rest as a negative type mark; counting the co-occurrence times L ki、Lgk of the unknown mark k and the mark i and the mark number N k of the unknown mark k, and recalculating the co-occurrence probability matrix of the complete mark
6. The method for classifying unknown marks in multi-mark picture classification based on deep learning according to claim 5, wherein the method comprises the steps of: in step S5, at the output layer, the generic feature label K i corresponding to the label i and the semantic feature L k of the label group are spliced, and are used as the i-th input of the classifier, and the confidence prediction process of the classifier on the appearance of the label i is as follows:
Qi=Ki||Lk (18)
In the formulae (18) to (19), And/>Is a parameter that can be learned; i is more than or equal to 1 and less than or equal to l, k is more than or equal to 1 and less than or equal to N g.
7. The method for classifying unknown marks in multi-mark picture classification based on deep learning as claimed in claim 6, wherein: in step S6, a sample t is given, and an output value is obtained after the model is operated The first q values of (1) are supervised by using the real information of the known mark, S1, S2, S3 and S5 are repeated, the image features are fully learned by using the supervision information of the known mark, and the total loss function is as follows:
In the formula (20), the amino acid sequence of the compound, For the sample the real information of the i-th known marker,/>Representing a confidence prediction value of the classifier for the occurrence of the ith mark; after the model converges, initializing an approximate solution W of the complete mark matrix by using non-negative matrix factorization, repeating S4, wherein the total loss function is as follows:
After W is obtained, the post r columns in W and the part of the classifier for predicting the unknown mark are used for carrying out consistency constraint, S1-S5 are repeated, and the total loss function is as follows:
In equation (21), u q+i represents the approximation of the corresponding ith marker of the sample in the complete marker matrix approximation solution W.
8. The method for classifying unknown marks in multi-mark picture classification based on deep learning as claimed in claim 7, wherein: in step S7, a test sample t is given, the test sample t is brought into the final classification model obtained through the learning in steps S1-S6, the parameters learned by the model are pi, and a predicted value is obtained for the test data t
Setting a threshold valueAccording to the obtained predicted value of the test sample t and the set threshold/>Calculating final output marker vectors y t∈{0,1}1×l of the test sample on q known markers and r unknown markers, wherein [ · ] is an indicator function; returning to 1 when the condition in brackets is satisfied according to the calculation result of the formula (22), indicating that the test sample belongs to the i-th class; otherwise, returning to 0, which means that the class i is not satisfied; the first q element values of the vector represent whether the test sample belongs to q known labels, and the last r elements represent whether the test sample belongs to newly discovered k labels;
yt(i)=[f(t,π)i>τ] (22)
in the formula (22), i is more than or equal to 1 and less than or equal to l.
9. The method for classifying unknown marks in multi-mark picture classification based on deep learning as claimed in claim 8, wherein: in step S8, the I-th image matrix I i,Ii∈Rc×w×h after the preprocessing; after inputting I i into the model, the attention coefficient matrix of the unknown mark k on different feature graphs is taken outAnd/>The attention coefficient matrix and the I i are respectively overlapped and output after being regulated to a uniform size, and three pictures with attention areas are obtained; according to the focused position of the attention area, an object corresponding to the unknown mark can be searched out, and the semantic name of the unknown mark is obtained.
CN202111655752.8A 2021-12-30 2021-12-30 Unknown mark classification method in multi-mark picture classification based on deep learning Active CN114299342B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111655752.8A CN114299342B (en) 2021-12-30 2021-12-30 Unknown mark classification method in multi-mark picture classification based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111655752.8A CN114299342B (en) 2021-12-30 2021-12-30 Unknown mark classification method in multi-mark picture classification based on deep learning

Publications (2)

Publication Number Publication Date
CN114299342A CN114299342A (en) 2022-04-08
CN114299342B true CN114299342B (en) 2024-04-26

Family

ID=80973802

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111655752.8A Active CN114299342B (en) 2021-12-30 2021-12-30 Unknown mark classification method in multi-mark picture classification based on deep learning

Country Status (1)

Country Link
CN (1) CN114299342B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110084296A (en) * 2019-04-22 2019-08-02 中山大学 A kind of figure expression learning framework and its multi-tag classification method based on certain semantic
CN111046965A (en) * 2019-12-18 2020-04-21 安徽工业大学 Method for discovering and classifying potential classes in multi-label classification
US10769766B1 (en) * 2018-05-31 2020-09-08 Amazon Technologies, Inc. Regularized multi-label classification from partially labeled training data
CN112132186A (en) * 2020-08-26 2020-12-25 安徽工业大学 Multi-label classification method with partial deletion and unknown class labels
CN112308115A (en) * 2020-09-25 2021-02-02 安徽工业大学 Multi-label image deep learning classification method and equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10769766B1 (en) * 2018-05-31 2020-09-08 Amazon Technologies, Inc. Regularized multi-label classification from partially labeled training data
CN110084296A (en) * 2019-04-22 2019-08-02 中山大学 A kind of figure expression learning framework and its multi-tag classification method based on certain semantic
CN111046965A (en) * 2019-12-18 2020-04-21 安徽工业大学 Method for discovering and classifying potential classes in multi-label classification
CN112132186A (en) * 2020-08-26 2020-12-25 安徽工业大学 Multi-label classification method with partial deletion and unknown class labels
CN112308115A (en) * 2020-09-25 2021-02-02 安徽工业大学 Multi-label image deep learning classification method and equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Multi-label learning with missing and completely unobserved labels;Jun Huang et al.;《Data Mining and Knowledge Discovery 》;20210312;第35卷;第1061-1086页 *
基于类属特征和实例相关性的多标签分类算法;张永等;《模式识别与人工智能》;20201231;第33卷(第5期);第439-448页 *

Also Published As

Publication number Publication date
CN114299342A (en) 2022-04-08

Similar Documents

Publication Publication Date Title
CN105844292B (en) A kind of image scene mask method based on condition random field and secondary dictionary learning
CN107133569B (en) Monitoring video multi-granularity labeling method based on generalized multi-label learning
CN104217225B (en) A kind of sensation target detection and mask method
CN114067160A (en) Small sample remote sensing image scene classification method based on embedded smooth graph neural network
CN109740686A (en) A kind of deep learning image multiple labeling classification method based on pool area and Fusion Features
CN112308115B (en) Multi-label image deep learning classification method and equipment
CN111476315A (en) Image multi-label identification method based on statistical correlation and graph convolution technology
Rad et al. Image annotation using multi-view non-negative matrix factorization with different number of basis vectors
CN112132186A (en) Multi-label classification method with partial deletion and unknown class labels
CN113051914A (en) Enterprise hidden label extraction method and device based on multi-feature dynamic portrait
Yang et al. Local label descriptor for example based semantic image labeling
Cao et al. Automatic image annotation method based on a convolutional neural network with threshold optimization
CN113065409A (en) Unsupervised pedestrian re-identification method based on camera distribution difference alignment constraint
Lin et al. Two stream active query suggestion for active learning in connectomics
CN114579794A (en) Multi-scale fusion landmark image retrieval method and system based on feature consistency suggestion
Zhou et al. Semantic image segmentation using low-level features and contextual cues
CN117393098A (en) Medical image report generation method based on visual priori and cross-modal alignment network
Fadavi Amiri et al. Improving image segmentation using artificial neural networks and evolutionary algorithms
CN114299342B (en) Unknown mark classification method in multi-mark picture classification based on deep learning
CN114842301A (en) Semi-supervised training method of image annotation model
CN115392474A (en) Local perception map representation learning method based on iterative optimization
Renuse et al. Multi label learning and multi feature extraction for automatic image annotation
Winiarti et al. Application of Artificial Intelligence in Digital Architecture to Identify Traditional Javanese Buildings
Liu et al. Automatic image co-segmentation: a survey
Mao et al. A Transfer Learning Method with Multi-feature Calibration for Building Identification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant