CN112232374B - Irrelevant label filtering method based on depth feature clustering and semantic measurement - Google Patents

Irrelevant label filtering method based on depth feature clustering and semantic measurement Download PDF

Info

Publication number
CN112232374B
CN112232374B CN202010992837.4A CN202010992837A CN112232374B CN 112232374 B CN112232374 B CN 112232374B CN 202010992837 A CN202010992837 A CN 202010992837A CN 112232374 B CN112232374 B CN 112232374B
Authority
CN
China
Prior art keywords
label
semantic
clustering
cluster
depth
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010992837.4A
Other languages
Chinese (zh)
Other versions
CN112232374A (en
Inventor
蒋雯
苗旺
耿杰
曾庆捷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN202010992837.4A priority Critical patent/CN112232374B/en
Publication of CN112232374A publication Critical patent/CN112232374A/en
Application granted granted Critical
Publication of CN112232374B publication Critical patent/CN112232374B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses an irrelevant label filtering method based on depth feature clustering and semantic measurement, which comprises the following steps of: firstly, a sensor acquires an image set; step two, establishing a label set corresponding to the image set; step three, extracting the depth characteristics of the image set image; fourthly, clustering the depth features to obtain a cluster; constructing a related semantic label set of the clustering cluster; step six, constructing a label set to be measured of the cluster; step seven, generating a semantic vector; step eight, calculating the correlation degree of the semantic vector; and step nine, filtering the irrelevant labels according to the relevance. The invention clusters huge sample image data to obtain clustering clusters for pre-classifying the sample image data, and analyzes the clustered sample image data to have higher effectiveness and correctness, and measures the relevance of the label semantics, thereby realizing the automatic filtration of irrelevant labels and improving the generalization and robustness of a deep network.

Description

Irrelevant label filtering method based on depth feature clustering and semantic measurement
Technical Field
The invention belongs to the technical field of deep learning, and particularly relates to an irrelevant label filtering method based on deep feature clustering and semantic measurement.
Background
With the development of artificial intelligence technology, deep learning technology has been widely applied, and has become an indispensable part of people in work and life, which is particularly shown in the fields of computer vision and artificial intelligence. The deep learning technology is a branch of machine learning, and is an algorithm for performing characterization learning on data by taking an artificial neural network as a framework.
The convolutional neural network proposed by YannLecun et al is widely and successfully applied to various image fields such as detection, segmentation, object recognition and the like. These applications use large amounts of tagged data. The premise that the deep learning technology can achieve good results is that the deep learning technology has massive training data, and the acquisition of the massive training data requires a large number of personnel to label the training data, so that the process needs expensive labor and material cost. Even if a pre-training model is obtained by combining label-free data with an unsupervised technology training network, the model with stronger generalization capability can be obtained only by the correlation between the semantic distribution of the training data and the data to be predicted.
The process of manually labeling labels is complicated. For different tasks of deep learning, such as target detection, semantic segmentation and the like, due to the diversity of data sources, some sample information and labels in sample data are irrelevant, and the keyword labels of the sample information play a key role in auditing, retrieving and organizing the sample, so that the irrelevant labeling easily causes that the labeled information cannot accurately reflect the characteristics of the sample data, and the time for fitting parameters of a deep learning model is prolonged, the efficiency is low, and the method is particularly suitable for the deep learning neural network with a complex structure and the deep learning neural network with a large number of layers. The problem of mislabeling of data is always the key research field of computer vision and artificial intelligence, so in order to improve the efficiency of a deep learning model, a technology for filtering irrelevant labels in a data set needs to be researched.
The prior art at present cannot meet the requirement of filtering irrelevant tags in a data set, so that a data set irrelevant tag filtering method which is used for filtering irrelevant tags in the data set, provides convenience for subsequent deep learning tasks and can improve the generalization and robustness of a deep network is urgently needed.
Disclosure of Invention
The technical problem to be solved by the invention is to provide an irrelevant label filtering method based on depth feature clustering and semantic measurement aiming at the defects in the prior art, the irrelevant label filtering method is simple in structure and reasonable in design, a large number of sample image data are clustered to obtain a cluster for pre-classifying the sample image data, the clustered sample image data are analyzed, higher effectiveness and correctness are achieved, and relevance measurement is performed on label semantics, so that the irrelevant label is automatically filtered, and the generalization and robustness of a depth network can be improved.
In order to solve the technical problems, the invention adopts the technical scheme that: the irrelevant label filtering method based on depth feature clustering and semantic measurement is characterized by comprising the following steps of: the method comprises the following steps:
the method comprises the following steps: the sensor acquires an image set X, and stores the image set X in a storage unit, X = { X = { [ X ] 1 ,...x i ,...x n In which x i Representing the ith sample image data, wherein i is more than or equal to 1 and less than or equal to n, and n is a positive integer;
step two: establishing a label set corresponding to the image set X in a storage unit;
step three, extracting the depth characteristics of the image set image: for sample image data X in image set X i Extracting depth characteristics to obtain depth characteristics phi (x) i );
Step four, clustering depth features to obtain cluster clusters: using a predetermined number k as the cluster number to the depth feature phi (x) i ) The clustering is carried out and the cluster is obtained, obtaining a cluster set A, A = { A = { (A) 1 ,...,A f ,...A k F is more than or equal to 1 and less than or equal to k, and k is a positive integer;
step five, constructing a related semantic label set of the clustering cluster: obtaining each cluster A according to the original category label set U in the step two f To cluster a, with a semantic tag of the cluster center f Semantic label of clustering center as related semantic label y f Obtaining a related semantic label set Y, Y = { Y) corresponding to the cluster set A 1 ,...,y f ,...y k };
Step six, constructing a label set to be measured of the clustering cluster: obtaining a label set P to be measured, P = { P = { (P) 1 ,...,P f ,...P k },P f Representing a cluster A f Obtaining each cluster A according to the original category label set U in the step two by the corresponding label set to be measured f The other category labels except the clustering center are added, and the cluster A is clustered f Adding labels of other classes except the clustering center into a label set P to be measured f
Figure GDA0004093034470000031
t is a positive integer;
step seven, generating semantic vectors: taking the related semantic tag set Y and the tag set P to be measured as input, and acquiring all semantic vectors H of the related semantic tag set Y f And P in the label set P to be measured f All semantic vectors K of fg
Step eight, calculating the correlation degree of the semantic vector: computer according to formula
Figure GDA0004093034470000041
Calculating related semantic label set Y and f-th clustering cluster label set P to be measured f Correlation of each label in fg In which H f Representing Guan Yuyi tag Y in the set of related semantic tags Y f Semantic vector of (2), K fg Representing P in a set P of labels to be measured f The semantic vector of the g-th label;
and step nine, irrelevant label filtering is carried out according to the relevance: cluster A is clustered f Intermediate correlation Sim fg Tags below the threshold η are deleted.
The irrelevant label filtering method based on depth feature clustering and semantic measurement is characterized by comprising the following steps of: in the third step, the sample image data X in the image set X is subjected to deep convolution residual error neural network model pre-trained in the large image data set Imagenet i And extracting depth features, wherein the network model consists of a convolutional layer, a residual layer and a full connection layer.
The irrelevant label filtering method based on depth feature clustering and semantic measurement is characterized by comprising the following steps of: in the fourth step, the depth characteristic phi (x) is subjected to spectral clustering algorithm i ) And clustering, specifically comprising the following steps:
step 401: constructing a depth feature phi (x) i ) W is the similarity matrix of s ij The similarity matrix of the composition is formed,
Figure GDA0004093034470000042
step 402: computing a diagonal matrix D, wherein
Figure GDA0004093034470000043
Wherein w ij Carrying out the element for expressing the ith row and the jth column in the similarity matrix W;
step 403: obtaining a depth feature phi (x) from L = D-W i ) A laplacian matrix L of;
step 404: performing eigenvalue decomposition on the Laplace matrix L to construct an eigenvector space, clustering eigenvectors in the eigenvector space through a clustering algorithm to obtain a cluster set A, A = { A = } 1 ,...,A f ,...A k }。
The irrelevant label filtering method based on depth feature clustering and semantic measurement is characterized by comprising the following steps of: and seventhly, generating semantic vectors by using a near word network model Synonyms.
The irrelevant label filtering method based on depth feature clustering and semantic measurement is characterized by comprising the following steps of: in the ninth step, a correlation threshold η is set by using the cosine distance between the semantic vector of the original category tag set U and the semantic vector of the related semantic tag set Y and the difference between the cosine distance between the semantic vector of the mis-labeled tag set V and the semantic vector of the related semantic tag set Y.
Compared with the prior art, the invention has the following advantages:
1. the invention has simple structure, reasonable design and convenient realization, use and operation.
2. The method adopts the depth convolution residual error neural network model pre-trained in the large image data set Imagenet to extract the depth characteristics, integrates the characteristics of strong learning capability of the depth convolution neural network and good convergence of residual error learning, has more robustness in characteristic extraction and selection, protects the integrity of image information, and improves the performance of results.
3. The invention clusters huge sample image data to obtain clustering clusters for pre-classifying the sample image data, reduces the time required by manual classification, avoids different classification results caused by subjective difference, can better screen image sets by analyzing the clustered sample image data, and has higher effectiveness and correctness.
4. The invention respectively obtains Guan Yuyi label Y in relevant semantic label set Y f Semantic vector and set of tags to be measured P f Calculating the semantic vector of the g-th label to be measured to obtain a label set P f The average value of the cosine distances between the g-th label in the set and each related semantic label in the related semantic label set Y is used as the label set P to be measured f And (3) carrying out relevance screening on the g-th label and the image set X, and carrying out relevance measurement on the label semantics by the method, thereby realizing automatic filtration of irrelevant labels.
In conclusion, the invention has simple structure and reasonable design, clusters are obtained by clustering huge sample image data and are used for pre-classifying the sample image data, the clustered sample image data are analyzed, higher effectiveness and correctness are achieved, and meanwhile, the relevance measurement is carried out on the label semantics, so that the automatic filtration of irrelevant labels is realized, and the generalization and robustness of a deep network can be improved.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
The method of the present invention will be described in further detail below with reference to the accompanying drawings and embodiments of the invention.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
For ease of description, spatially relative terms such as "over … …", "over … …", "over … …", "over", etc. may be used herein to describe the spatial positional relationship of one device or feature to another device or feature as shown in the figures. It will be understood that the spatially relative terms are intended to encompass different orientations of the device in use or operation in addition to the orientation depicted in the figures. For example, if a device in the figures is turned over, devices described as "above" or "on" other devices or configurations would then be oriented "below" or "under" the other devices or configurations. Thus, the exemplary term "above … …" may include both orientations of "above … …" and "below … …". The device may be otherwise variously oriented (rotated 90 degrees or at other orientations) and the spatially relative descriptors used herein interpreted accordingly.
As shown in fig. 1, the present invention comprises the steps of:
the method comprises the following steps: conveying applianceThe sensor acquires an image set X, and stores the image set X in a storage unit, wherein X = { X = { (X) } 1 ,...x i ,...x n In which x i I is more than or equal to 1 and less than or equal to n, and n is a positive integer.
In actual use, different types of sample image data are acquired through the sensor, different objects are obtained, and the sample image data acquired by the sensor are also different.
Step two: a set of labels corresponding to the image set X is established in the storage unit. In specific implementation, the data set includes an image set X and a tag set, where the tag set includes an original category tag set U and a mis-labeled tag set V, and U = { U = 1 ,...,u p ,...u h },V={v 1 ,...,v q ,...v l And h is more than or equal to 1 and less than or equal to p and less than or equal to l, wherein h and l are positive integers, and l + h = n. Each sample image data X in the image set X i Corresponds to one of the tags in the original category tag set U. And the label set V is used for storing the finally screened unrelated labels.
Step three, obtaining image depth characteristics: computer-to-image sample image data X in image set X i Extracting depth characteristics to obtain depth characteristics phi (x) i )。
In specific implementation, in order to improve the clustering effect, the sample image data X in the image set X needs to be extracted i Into an appropriate characterization. In specific implementation, the computer adopts a large image data set Imagenet to pre-train a deep convolution residual neural network, and related sample image data x are extracted according to the acquired sensor data i Depth feature of (x) i ) The characteristics of strong learning capability of a deep convolutional neural network and good residual learning convergence are integrated, the feature extraction and selection have robustness, the problem of lack of image feature details is solved, the integrity of image information is protected, and the result performance is improved.
The method specifically comprises the following steps: deep convolutional neural network model checking sample image data x with convolution kernel for each layer i Performing convolution operation, and extracting each sample image data x i Using a residual network model to convolve the neural network in depthAdding jump connection to the network, sample image data x i The initial features are directly transmitted to a back layer in a deep convolution neural network to improve the performance of results, and then the full connection layers of the feature input model are spliced to obtain sample image data x i Depth feature of (x) i ) Depth feature phi (x) i ) Involving x relating to sample image data i Of the depth feature phi (x) and thus i ) Affecting the final classification filtering effect.
Step four, obtaining clustering clusters: using a predetermined number k as the cluster number to the depth feature phi (x) i ) Clustering is carried out to obtain a cluster set A, A = { A = } 1 ,...,A f ,...A k F is more than or equal to 1 and less than or equal to k, and k is a positive integer;
the spectral clustering algorithm is based on the similarity matrix, converts the common clustering problem into the graph partitioning problem, is established on the basis of spectrogram theory, is not limited by the space shape of a sample during clustering, and is superior to the traditional clustering algorithm. The spectral clustering algorithm starts from the global state when solving, has the advantage of converging to the global optimal solution, cannot fall into the local optimal solution, can ensure the minimum similarity among different classes and the maximum similarity in the same class, and has better performance and application scene than the traditional clustering algorithm. Therefore, the application preferably adopts a spectral clustering algorithm to perform depth feature phi (x) i ) And (6) clustering.
The specific process of clustering by adopting the spectral clustering algorithm comprises the following steps: constructing a similarity matrix W of the image set X, wherein W is the matrix composed of s ij The similarity matrix of the composition is formed,
Figure GDA0004093034470000101
σ denotes the standard deviation. The similarity matrix W is denoted as W = (W) ij ) i,j=1,...n The computer based on the formula->
Figure GDA0004093034470000102
Computing diagonal matrix D, D = { D = { 1 ,...d i ,...d n }. Obtaining a depth feature phi (x) from L = D-W i ) Of the laplacian matrix L, performing the characteristic on the laplacian matrix LEigenvalue decomposition, constructing a feature vector space, clustering feature vectors in the feature vector space through a clustering algorithm to obtain a cluster set A, A = { A = } 1 ,...,A f ,...A k }。
The clustering cluster is obtained by clustering huge sample image data and is used for pre-classifying the sample image data, the time required by manual classification is reduced, the different classification results caused by subjective difference are avoided, the image set can be better screened by analyzing the clustered sample image data, the method has higher effectiveness and correctness, and a more reliable method is provided for filtering irrelevant labels.
Step five, constructing a related semantic tag set: obtaining each cluster A according to the original category label set U in the step two f To cluster a, with a semantic tag of the cluster center f Semantic label of clustering center as related semantic label y f Obtaining a relevant semantic label set Y, Y = { Y = { Y = 1 ,...,y f ,...y k }。
Since each sample image data X in the image set X i All correspond to a label in the original category label set U, so cluster A is clustered f The cluster center of (b) also corresponds to a semantic tag in the original category tag set U.
Step six, constructing a label set to be measured: obtaining a label set P to be measured, P = { P = { (P) 1 ,...,P f ,...P k },P f Represents a cluster A f Clustering the corresponding label sets to be measured into clusters A f Combining labels corresponding to the clustering elements except the clustering center to form a label set P to be measured f
Figure GDA0004093034470000111
t is a positive integer.
Similarly, since each sample image data X in the image set X i All correspond to a label in the original category label set U, so cluster A is clustered f And the lower clustering elements except the clustering center also respectively correspond to one semantic label in the original category label set U.
Step seven, generating semantic vectors: taking the related semantic tag set Y and the tag set P to be measured as input, and acquiring all semantic vectors H of the related semantic tag set Y f And P in the label set P to be measured f All semantic vectors K of fg
It should be noted that the present application preferably uses the near word network model Synonyms. The near word network model synnyms is a trained word2vec model, the word2vec uses a large amount of data, is trained by using context information, maps words to a low-dimensional space, and is based on distance instead of matching and semantic instead of form on the aspect of algorithm. The near word network model Synonyms is used as a trained word2vec model, can map each word to a vector, can be used for representing the relation between words and words, and has the capability of measuring the correlation degree between the words and the words.
And inputting words into the near meaning word network model Synonyms for prediction, outputting hidden layer variables by the near meaning word network model Synonyms, and calculating parameters according to the hidden layer variables to obtain semantic vectors corresponding to the words. In other words, the near word network model synnym can output the mathematical expression form of the word, namely the semantic vector, according to the input word.
Step eight, calculating the correlation degree of the semantic vector: computer according to formula
Figure GDA0004093034470000121
Calculating related semantic label set Y and f-th clustering cluster label set P to be measured f Correlation of each label in fg In which H is f Representing Guan Yuyi tag Y in the set of related semantic tags Y f Semantic vector of (2), K fg Representing P in a set P of labels to be measured f The semantic vector of the g-th label;
Sim fg representing a set of tags P to be measured f The average of the cosine distances between the g-th label in the set of related semantic labels Y and each related semantic label in the set of related semantic labels Y, so that Sim is obtained fg As a set of labels P to be measured f The correlation degree between the g-th label and the image set X is taken as the label set P to be measured f The g-th label in (1) should be filtered.
And step nine, irrelevant label filtering is carried out according to the relevance: in the original category label set U, clustering A f Intermediate correlation Sim fg Labels below a threshold η are deleted; in the image set X, cluster A is clustered f Intermediate correlation Sim fg Sample image data x corresponding to label lower than threshold η i And deleting the data to obtain a trainable data set, so that the fitting time of the deep learning model parameters is reduced, the fitting efficiency is improved, and the using effect is good.
In specific implementation, a correlation threshold η is set by using the cosine distance between the semantic vector of the original category tag set U and the semantic vector of the related semantic tag set Y and the difference between the cosine distance between the semantic vector of the mis-labeled tag set V and the semantic vector of the related semantic tag set Y.
The above embodiments are only examples of the present invention, and are not intended to limit the present invention, and all simple modifications, changes and equivalent structural changes made to the above embodiments according to the technical spirit of the present invention still fall within the protection scope of the technical solution of the present invention.

Claims (5)

1. The irrelevant label filtering method based on depth feature clustering and semantic measurement is characterized by comprising the following steps of: the method comprises the following steps:
the method comprises the following steps: the sensor acquires an image set X, and stores the image set X in a storage unit, X = { X = { [ X ] 1 ,...x i ,...x n In which x i Representing the ith sample image data, wherein i is more than or equal to 1 and less than or equal to n, and n is a positive integer;
step two: establishing a label set corresponding to the image set X in a storage unit, wherein the label set comprises an original category label set U and a mislabeling label set V;
step three, extracting the depth characteristics of the image set image: for sample image data X in image set X i Extracting depth characteristics to obtain depth characteristics phi (x) i );
Step four, clustering depth features to obtain clusters: using a predetermined number k as the cluster number to the depth feature phi (x) i ) Clustering is carried out to obtain a cluster set A, A = { A = } 1 ,...,A f ,...A k F is more than or equal to 1 and less than or equal to k, and k is a positive integer;
step five, constructing a related semantic label set of the clustering cluster: obtaining each cluster A according to the original category label set U in the step two f To cluster a, with a semantic tag of the cluster center f Semantic label of clustering center as related semantic label y f Obtaining a related semantic label set Y, Y = { Y) corresponding to the cluster set A 1 ,...,y f ,...y k };
Step six, constructing a label set to be measured of the clustering cluster: obtaining a label set P to be measured, P = { P = { (P) 1 ,...,P f ,...P k },P f Represents a cluster A f Obtaining each cluster A according to the original category label set U in the step two by the corresponding label set to be measured f The other category labels except the clustering center are added, and the cluster A is clustered f Adding other category labels except the cluster center into the cluster A f Corresponding label set P to be measured f
Figure FDA0004093034450000021
G is more than or equal to 1 and less than or equal to t, and t is a positive integer;
step seven, generating semantic vectors: taking the related semantic tag set Y and the tag set P to be measured as input, and acquiring all semantic vectors H of the related semantic tag set Y f And P in the label set P to be measured f All semantic vectors K of fg
Step eight, calculating the correlation degree of the semantic vector: computer according to formula
Figure FDA0004093034450000022
Calculating related semantic label set Y and f-th clustering cluster label set P to be measured f Correlation of each label in fg Wherein H f | | represents the Guan Yuyi label Y in the relevant semantic label set Y f Of the semantic vector, | K fg I represents P in label set P to be measured f The semantic vector of the g-th label;
and step nine, irrelevant label filtering is carried out according to the relevance: cluster A is clustered f Intermediate correlation Sim fg Tags below the threshold η are deleted.
2. The method of claim 1 for uncorrelated label filtering based on depth feature clustering and semantic metrics, wherein: in the third step, the sample image data X in the image set X is subjected to deep convolution residual error neural network model pre-trained in the large image data set Imagenet i And extracting depth features, wherein the network model consists of a convolutional layer, a residual layer and a full connection layer.
3. The method of claim 1 for irrelevant label filtering based on depth feature clustering and semantic metrics, wherein: in the fourth step, the depth characteristic phi (x) is subjected to spectral clustering algorithm i ) And clustering, specifically comprising the following steps:
step 401: constructing a depth feature phi (x) i ) W is the similarity matrix of s ij The similarity matrix of the composition is formed,
Figure FDA0004093034450000031
σ represents the standard deviation;
step 402: computing diagonal matrix D, D = { D = { 1 ,...d i ,...d n Therein of
Figure FDA0004093034450000032
Step 403: obtaining a depth feature phi (x) from L = D-W i ) A laplacian matrix L of;
step 404: performing eigenvalue decomposition on the Laplace matrix L to construct an eigenvector space, and clustering eigenvectors in the eigenvector space by a clustering algorithm to obtainTo cluster set a, a = { a = { a = } 1 ,...,A f ,...A k }。
4. The method of claim 1 for irrelevant label filtering based on depth feature clustering and semantic metrics, wherein: and seventhly, generating semantic vectors by using a near word network model Synonyms.
5. The method of claim 1 for uncorrelated label filtering based on depth feature clustering and semantic metrics, wherein: in the ninth step, a correlation threshold η is set by using the cosine distance between the semantic vector of the original category tag set U and the semantic vector of the related semantic tag set Y and the difference between the cosine distance between the semantic vector of the mis-labeled tag set V and the semantic vector of the related semantic tag set Y.
CN202010992837.4A 2020-09-21 2020-09-21 Irrelevant label filtering method based on depth feature clustering and semantic measurement Active CN112232374B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010992837.4A CN112232374B (en) 2020-09-21 2020-09-21 Irrelevant label filtering method based on depth feature clustering and semantic measurement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010992837.4A CN112232374B (en) 2020-09-21 2020-09-21 Irrelevant label filtering method based on depth feature clustering and semantic measurement

Publications (2)

Publication Number Publication Date
CN112232374A CN112232374A (en) 2021-01-15
CN112232374B true CN112232374B (en) 2023-04-07

Family

ID=74108089

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010992837.4A Active CN112232374B (en) 2020-09-21 2020-09-21 Irrelevant label filtering method based on depth feature clustering and semantic measurement

Country Status (1)

Country Link
CN (1) CN112232374B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113111180B (en) * 2021-03-22 2022-01-25 杭州祺鲸科技有限公司 Chinese medical synonym clustering method based on deep pre-training neural network
CN113435308B (en) * 2021-06-24 2023-05-30 平安国际智慧城市科技股份有限公司 Text multi-label classification method, device, equipment and storage medium
CN114998634B (en) * 2022-08-03 2022-11-15 广州此声网络科技有限公司 Image processing method, image processing device, computer equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017084267A1 (en) * 2015-11-18 2017-05-26 乐视控股(北京)有限公司 Method and device for keyphrase extraction
CN111080551A (en) * 2019-12-13 2020-04-28 太原科技大学 Multi-label image completion method based on depth convolution characteristics and semantic neighbor

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120158686A1 (en) * 2010-12-17 2012-06-21 Microsoft Corporation Image Tag Refinement
CN103092911B (en) * 2012-11-20 2016-02-03 北京航空航天大学 A kind of mosaic society label similarity is based on the Collaborative Filtering Recommendation System of k nearest neighbor
US9082047B2 (en) * 2013-08-20 2015-07-14 Xerox Corporation Learning beautiful and ugly visual attributes
US20150347895A1 (en) * 2014-06-02 2015-12-03 Qualcomm Incorporated Deriving relationships from overlapping location data
US20180300315A1 (en) * 2017-04-14 2018-10-18 Novabase Business Solutions, S.A. Systems and methods for document processing using machine learning
US10482323B2 (en) * 2017-08-22 2019-11-19 Autonom8, Inc. System and method for semantic textual information recognition
CN107563444A (en) * 2017-09-05 2018-01-09 浙江大学 A kind of zero sample image sorting technique and system
RU2711125C2 (en) * 2017-12-07 2020-01-15 Общество С Ограниченной Ответственностью "Яндекс" System and method of forming training set for machine learning algorithm
US11194842B2 (en) * 2018-01-18 2021-12-07 Samsung Electronics Company, Ltd. Methods and systems for interacting with mobile device
CN111177444A (en) * 2020-01-02 2020-05-19 杭州创匠信息科技有限公司 Image marking method and electronic equipment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017084267A1 (en) * 2015-11-18 2017-05-26 乐视控股(北京)有限公司 Method and device for keyphrase extraction
CN111080551A (en) * 2019-12-13 2020-04-28 太原科技大学 Multi-label image completion method based on depth convolution characteristics and semantic neighbor

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李艳 ; 贾君枝 ; .基于向量空间模型的标签树构建方法研究.情报学报.2014,(03),全文. *

Also Published As

Publication number Publication date
CN112232374A (en) 2021-01-15

Similar Documents

Publication Publication Date Title
CN112232374B (en) Irrelevant label filtering method based on depth feature clustering and semantic measurement
Kodirov et al. Semantic autoencoder for zero-shot learning
Zhang et al. Detecting densely distributed graph patterns for fine-grained image categorization
Ott et al. A deep learning approach to identifying source code in images and video
Wan et al. BlastNeuron for automated comparison, retrieval and clustering of 3D neuron morphologies
CA3066029A1 (en) Image feature acquisition
CN109829467A (en) Image labeling method, electronic device and non-transient computer-readable storage medium
Yu et al. A computational model for object-based visual saliency: Spreading attention along gestalt cues
WO2017151759A1 (en) Category discovery and image auto-annotation via looped pseudo-task optimization
CN113887661B (en) Image set classification method and system based on representation learning reconstruction residual analysis
CN110189305B (en) Automatic analysis method for multitasking tongue picture
CN109284668B (en) Pedestrian re-identification method based on distance regularization projection and dictionary learning
CN109255289A (en) A kind of across aging face identification method generating model based on unified formula
Sun et al. Global-local label correlation for partial multi-label learning
Wu et al. Plant leaf identification based on shape and convolutional features
CN111882000A (en) Network structure and method applied to small sample fine-grained learning
CN108805181B (en) Image classification device and method based on multi-classification model
Pierce et al. Reducing annotation times: Semantic segmentation of coral reef survey images
CN114610924A (en) Commodity picture similarity matching search method and system based on multi-layer classification recognition model
Yu et al. An image-based automatic recognition method for the flowering stage of maize
CN115392474A (en) Local perception map representation learning method based on iterative optimization
CN115376178A (en) Unknown domain pedestrian re-identification method and system based on domain style filtering
CN110941994B (en) Pedestrian re-identification integration method based on meta-class-based learner
Mao et al. A Transfer Learning Method with Multi-feature Calibration for Building Identification
Kumar et al. Image classification in python using Keras

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant