CN110516098A - Image labeling method based on convolutional neural networks and binary coding feature - Google Patents

Image labeling method based on convolutional neural networks and binary coding feature Download PDF

Info

Publication number
CN110516098A
CN110516098A CN201910791484.9A CN201910791484A CN110516098A CN 110516098 A CN110516098 A CN 110516098A CN 201910791484 A CN201910791484 A CN 201910791484A CN 110516098 A CN110516098 A CN 110516098A
Authority
CN
China
Prior art keywords
image
tag
network model
label
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910791484.9A
Other languages
Chinese (zh)
Inventor
薛越
王邦军
吴新建
张莉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou University
Original Assignee
Suzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University filed Critical Suzhou University
Priority to CN201910791484.9A priority Critical patent/CN110516098A/en
Publication of CN110516098A publication Critical patent/CN110516098A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • G06F18/2193Validation; Performance evaluation; Active pattern learning techniques based on specific statistical tests
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Library & Information Science (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of image labeling method based on convolutional neural networks and binary coding feature, comprising the following steps: building Incepiton V3 basic network model;Intercept the last pond layer of Incepiton V3 network foundation model, remove Logits the and softmax function of Incepiton V3 network foundation model, and sigmoid function is used as the activation primitive of the last layer, obtains modified first foundation network model;It added two layers of full articulamentum on first foundation network model, and use sigmoid function as the activation primitive of the last layer, obtain multi-tag sorter network model;Study is trained to training set using multi-tag sorter network model, optimizes the weight of multi-tag sorter network model;Based on the set of eigenvectors of trained multi-tag sorter network model label target image, the multi-tag probability output of target image is obtained;In conjunction with multi-tag probability output, target image is labeled using TagProp algorithm.Its multi-tag mark that can be realized image, it is at low cost, it is high-efficient.

Description

Image labeling method based on convolutional neural networks and binary coding feature
Technical field
The present invention relates to visual pattern technical fields, and in particular to one kind is special based on convolutional neural networks and binary coding The image labeling method of sign.
Background technique
In order to realize that effective management and retrieval of large-scale image, the efficient mark of image seem more important.Image mark The target of note is to distribute one group of relevant descriptive label for image.Traditional image labeling algorithm requires to take a significant amount of time Manual extraction characteristics of image, and may take less than good effect, therefore deep learning is applied on image labeling.It is deep Degree study can obtain the semantic feature of image higher, this reduces the difference with this high level semantic-concept of label.It is based on The Automatic image annotation algorithm of deep learning is not necessarily to manual extraction characteristics of image, so that dimensioning algorithm is no longer limited by feature extraction The selection of method, and its training method greatly improves annotating efficiency end to end.Convolutional neural networks are as depth Popular model in habit has many advantages, such as to be suitble to processing high dimensional data, good classification effect, carries out image labeling using it, can be with Obtain preferably mark effect.
The work of traditional images mark, which is all based on, to be carried out on the basis of single labeling, every image assume that with One label correlation.And in a practical situation, an image is often associated with multiple labels, and single label can not completely describe whole Open image.Current convolutional neural networks model is also all based on single label image classification task building, and loss function is logical It is often Softmax function, a label of maximum probability can only be assigned to image.It, must in order to be labeled to multi-tag image More particularly suitable method must be found.In addition, Automatic image annotation algorithm is increasingly focused on arriving millions of images in this way hundreds of thousands of Large database on application.Therefore, it is necessary to consider time cost, more succinct and efficient character representation method is explored.
Summary of the invention
The technical problem to be solved in the present invention is to provide a kind of figure based on convolutional neural networks and binary coding feature As mask method, the multi-tag mark of image can be realized, it is at low cost, it is high-efficient.
In order to solve the above-mentioned technical problems, the present invention provides one kind to be based on convolutional neural networks and binary coding feature Image labeling method, comprising the following steps:
Construct Incepiton V3 basic network model;
The last pond layer of the Incepiton V3 network foundation model is intercepted, the Incepiton V3 network is removed Logits the and softmax function of basic model, and use sigmoid function as the activation primitive of the last layer, it obtains Modified first foundation network model;
It added two layers of full articulamentum on the first foundation network model, and use sigmoid function as most The activation primitive of later layer obtains multi-tag sorter network model;
Study is trained to training set using the multi-tag sorter network model, optimizes the multi-tag sorter network The weight of model;
Based on the set of eigenvectors of trained multi-tag sorter network model label target image, the target figure is obtained The multi-tag probability output of picture;
In conjunction with the multi-tag probability output, the target image is labeled using TagProp algorithm.
Preferably, described " be trained study to training set using the multi-tag sorter network model, optimize institute State the weight of multi-tag sorter network model ", it specifically includes:
Study is trained to training set using the multi-tag sorter network model, obtains loss function;
Training is finely adjusted to the multi-tag sorter network model according to described;Wherein, the specific packet of fine tuning training Include: the weight of the convolutional layer before fixed two layers of full articulamentum passes through described two layers of optimization full connection of backpropagation training Layer.
Preferably, described " in conjunction with the multi-tag probability output, using TagProp algorithm to the target image It is labeled ", it specifically includes:
For target image x, possess j-th of label i.e. yj=+1 probability are as follows:
Wherein, πiIndicate the weight of prediction label, p (yj=+1 | xi) indicate that Target Photo has j-th under conditions of xi The probability of label, it may be assumed that
Wherein ε is predetermined value;
It is solved by maximizing the log-likelihood of label in training set, then the loss function of model are as follows:
L=∑jcjlogp(yj)
Wherein, parameter cjFor measuring the loss that image X belongs to label j.
Preferably, calculating weight π based on the method for distancei, i.e.,
Wherein, dhi(x, xi)=hiD (x, xi), d (x, xi) be x and xi fundamental distance.
Preferably, using sigmoid function to "" improve, i.e. p (yj=+1) =σ (αjzjj), wherein αjFor weight, βjFor biasing, zjIt is the weighted average of label j in the neighbour of target image X, i.e. zj =∑iπiyj
Preferably, " the parameter cjFor measuring the loss that image X belongs to label j ", it specifically includes:
Work as yjWhen=+ 1, cj=1/N+;Work as yjWhen=- 1, cj=1/N-;Wherein N+ indicates to belong to label j's in training set The number of picture, N- indicate the number that the picture of label is not belonging in training set, i.e.,
Beneficial effects of the present invention: the invention proposes a convolutional neural networks figures based on Sigmoid loss function As marking model, by one new network model of building and the binary coding feature of image is extracted, then uses TagProp Algorithm carries out image labeling, can be realized the multi-tag mark of image, at low cost, speed is fast, high-efficient.
Detailed description of the invention
Fig. 1 is the schematic diagram of Inception network model;
Fig. 2 is InceptionV3 schematic network structure;
Fig. 3 is flow diagram of the invention;
Fig. 4 is CNN-Sigmoid and CNN-Softmax contrast schematic diagram, wherein (a) is Natural Scenes data Experimental result on collection;It (b) is experimental result on Corel5K data set.
Specific embodiment
The present invention will be further explained below with reference to the attached drawings and specific examples, so that those skilled in the art can be with It more fully understands the present invention and can be practiced, but illustrated embodiment is not as a limitation of the invention.
The characteristics of present invention uses Inception V3 as the basic network topology of model, Inception network is control While having made calculation amount and parameter amount, extraordinary classification performance is had also obtained.Inception network is not to increase simply The number of plies of network, it proposes Inception Module, and structure is as shown in Figure 1, be the signal of Inception network model Figure.This modular design reduces the number of parameters of network, and reduces the design space of network, while increasing network Thickness.Inception network model also introduces the thought of decomposition, replaces a big convolution using two small convolution, further Reduce parameter amount, while adding the non-linear of network.
Inception V3 mainly to two aspect be transformed, be on the one hand the introduction of the thought of decomposition, by 7 × 7 this The big convolution of sample resolves into 1 × 7,7 × 1 two convolution, and 5 × 5 convolution are also similarly decomposed (1 × 5,5 × 1), so both Reduce number of parameters, and because 1 convolution is splitted into 2 convolution, has deepened the depth of network, enhanced the non-linear of network With the ability to express of model.It on the other hand is that Inception V3 optimizes Inception module, so that Inception module There are 35 × 35,17 × 17 and 8 × 8 three kinds of different structures.And these modules only network behind occur, before or commonly Convolutional layer.As shown in Fig. 2, being InceptionV3 schematic network structure.
Referring to shown in Fig. 3, the invention discloses a kind of image mark based on convolutional neural networks and binary coding feature Injecting method.
The present invention finely tunes Inception V3 network model.In order to preferably adapt to multi-tag classification, the present invention The last pond layer of Inception V3 has been intercepted, Logits and softmax function is eliminated, added two layers of full articulamentum, And use sigmoid function as the activation primitive of the last layer.Two kinds of activation primitives can be applied to situations of classifying more, The difference is that multiclass may overlap with each other in sigmoid, softmax be then it is all kinds of mutually exclusive, this allows for sigmoid Function is more suitable for multi-tag classification.
Assuming that there is training set I={ I1,I2,···,IN, Ii∈Rm×n, m and n are the height and width of picture.In multi-tag In habit, every picture has multiple labels.If the number of tags of data lump is c, the label vector of i-th image is yi.Due to making With sigmoid function, then the probability that i-th image has j-th of label isWherein fj(Ii) it is net J-th of unit that the last layer exports in network.For label, if i-th image has j-th of label, yij=1, otherwise yij =0, then true probability is pij=yij/‖yi‖1.Loss function can be with as a result, is defined as:
In order to accelerate the training speed of model, the present invention is finely adjusted training to network model, i.e., fixed front convolutional layer Weight, pass through two layers of full articulamentum adding of backpropagation training optimization.
The present invention extracts the feature X={ x1, x2, xN } of the output vector of full articulamentum as image, corresponding Tag set is Y={ y1, y2, yN }.
For the high-level semantics features for preferably CNN being utilized to extract, the present invention applies features to TagProp image labeling In model.TagProp is the model based on arest neighbors, the method that it uses neighbour's ballot, it is contemplated that with target image vision Different weights should be occupied in ballot apart from different pictures, it is a series of that the acceptance of the bid of final goal image, which checks out existing probability, Weighted sum.
The condition that TagProp learns the label of given image by calculating pairwise distance is distributed.For target image x, It possesses j-th of label i.e. yj=+1 probability are as follows:
Wherein π i indicates that sample xi is the weight of x prediction label, p (yj=+1 | xi) indicate target figure under conditions of xi Piece has the probability of j-th of label, is defined as:
Wherein ε is the value of very little, is 0 to avoid probability.We are using weight π i is calculated based on the method for distance, i.e.,
Wherein dhi(x, xi)=hiD (x, xi).D (x, xi) it is x and xiFundamental distance, the present invention use Euclidean distance come Measure the distance of two images, hi>=0 can optimize.The probability occurred in target image x by calculating all labels, Take wherein the maximum k label of probability value as final mark.
It is solved by maximizing the log-likelihood of label in training set, then the loss function of model are as follows:
L=Σjcjlogp(yj) (formula 5)
Parameter ciIt is different to allow for the picture number that each tag concept includes, for measuring the damage that image X belongs to label j It loses.Specifically, work as yjWhen=+ 1, cj=1/N+;Work as yjWhen=- 1, cj=1/N-.Wherein N+, N-It respectively indicates in training set and belongs to In the number with the picture for being not belonging to label j, i.e.,Limitation h in element be all it is non-negative, utilize throwing The value of shadow gradient algorithm solution h.
It will be understood that semantic concept lower for those frequencies of occurrences, even if occurring in neighbour several times, also only It can obtain lower prediction probability, therefore be had on the semantic concept that these occupy picture rareness by the above method lower Recall value.Therefore to y predicted abovej=+1 (formula 3) improves, and is carried out smoothly, i.e., using sigmoid function
p(yj=+1)=σ (αjzjj) (formula 6)
zjIt is the weighted average of label j in the neighbour of target image X, i.e.,For weight, βjIt is inclined It sets.Here sigmoid function is the weight enhancing in order to make the label of relative rarity, to weaken the higher label of the frequency of occurrences Weight.
Based on above-mentioned image labeling method, specific practical different data collection carries out test analysis, and situation is as follows:
The present invention is assessed on Natural Scenes, Corel5K, ESP-Game, multiple data sets such as IAPRTC-12 and is mentioned The validity of model out, each data set information are shown in Table 1.
These data sets are briefly introduced separately below:
(1) data set
The present invention is assessed on Natural Scenes, Corel5K, ESP-Game, multiple data sets such as IAPRTC-12 and is mentioned The validity of model out, each data set information are shown in Table 1.These data sets are briefly introduced separately below.
Each data set information of table 1
Image set Picture number Label Average label Training image number Test image number
Natural 2000 5 2.3 1500 500
Corel5K 5000 260 3.4 4000 1000
ESP-Game 20000 268 4.7 15000 5000
IAPRTC 12 19627 291 5.7 15000 4627
Natural Scenes data set scale is smaller, there is 2000 images, and it is sunset respectively that these images, which have been divided into 5 classes, Sun, desert, forest, ocean and mountain, and every image has 1~2 label.
The picture amount of Corel-5K data set is medium, it one shares 5000 images, contain weather, scenery, building and 260 kinds of labels such as vehicles, every image has 1~5 label, and the average number of tags of every image is 3.5.Due to The label information of Corel-5K data set is all more accurate, and often it is often used in the classification experiments of various multi-tag images.
ESP-Game image set is larger, and one shares 20770 images.It one shares 268 kinds of labels, and label covers Capping is very wide, including drawing, building, animal etc..Every image has 1~15 label, and average every image has 4.6. ESP-Game data set picture amount is larger, and there are some error labels, so this chapter has done some processing to it, has gone some The image of label information inaccuracy shares 20000 eventually for the image of experiment.
IAPRTC-12 is equally the more data set of a picture amount, altogether includes 19627 width images.Its label is With the sentence with practical significance of various language descriptions.Major terms can be extracted by using natural language processing technique, It is converted into format similar with other data sets.Eventually passing through statistics, it one shares 291 kinds of marks, and every image is averaged Number of tags is 5.7.And the large percentage that the mankind's image caning be found that in IAPRTC-12 image accounts for.
(2) experimental setup and evaluation index
It tests and is carried out on the server equipped with GPU, server system is Ubuntu 16.04, is furnished with 2 pieces of NVIDIA GeForce TITAN video card.Experiment uses TensorFlow deep learning frame, programming language Python.
As said before, the network model that this model uses is the Inception V3 net that pre-training is good on ImageNet Network.When finely tuning to the network model, to the two lesser data sets of Natural Scenes and Corel-5K, learning rate is set It is set to 0.0001, for the two biggish data sets of ESP-Game and IAPRTC-12, learning rate is set as 0.0005.Learning rate Exponential damping to be all set as 0.99995, mini-batch be all 32, dropout is 0.5.In addition, constructing k set of tags At image tag candidate collection when, 5, ESP- are taken for Natural Scenes data set k=2, Corel-5K data set k Game data set k takes 6, and IAPRTC-12 data set k takes 7.
Icon mark belongs to multi-tag study, therefore present invention introduces the evaluation indexes of some multi-tags classification.For one A test set S={ (x1, Y1), (x2, Y2) ..., (xp, Yp), wherein Y is tag set.
1, Hamming loss (HL):Wherein Q is of label in sample set Number, h (xi) it is expressed as the tag set of sample i prediction, Δ is xor operation.HL can be used to assess a sample more by mistake point Few time, for example, a sample is not belonging to label A but is divided into label A by mistake, either, a sample belongs to label A, still It is not predicted to be label A.It may also be said that calculated with hamming loss result sequence that classifier predicts and result sequence it Between distance numerically.HL value is smaller, and prediction result is better.
2, One-error (OE):
f(xi, y) and it is prediction score of the sample i for label y.What OE was indicated is the label for exporting highest scoring in result The not probability in true tag set.Therefore OE value is smaller, and prediction result is better.
3, Coverage (C):rankf(xi, y) and table Showing and is ranked up according to the probability of sample label prediction, true tag also and then sorts,It indicates The position for the label that the last one label is 1 in true tag sequence after drained sequence.It is averagely also poor more that coverage rate evaluates us Far, indicate in all documents that (rank is since 1 ing, so back subtracts for the sequence average value of true tag of the sequence after It is a 1), it is better with the smaller performance of sample value.
4, Ranking loss (RL):
F is pre- Survey function, Y-I is the supplementary set of Yi, | Yi | indicate the quantity of sample i physical tags.Sequence loss is used to indicate in sort result In, the sample for being not belonging to respective labels concentration has been come being averaged for the probability for belonging to respective labels concentration sample.RL value is smaller, Prediction result is better.
5, Average precision (AP):。 It illustrates that for each prediction result, the label of prediction is correct and the forward probability that sorts in result set.AP value Bigger, prediction effect is better.
In addition to this, the present invention additionally uses most common several indexs to measure the performance of image labeling method, respectively It is accuracy rate P (Precision), recall rate R (Recall), F1 value and N+.For a certain label i, accuracy rate calculate be by Ratio of the image correctly marked in the actually image that mark, recall rate calculating is the image that is correctly marked should The ratio in image being marked.Assuming that marking correct picture number and beingAll picture numbers retrieved areTest set In all picture numbers relevant to the keyword beThen have
F1 value is to balance the index of accuracy rate and recall rate, is hadIn addition, N+ indicates correct in all labels The number of labels of mark, this index reflect algorithm to the level of coverage of label.
(3) experimental result
In order to verify the performance of proposed image labeling model, the present embodiment is right in the indexs such as HL, OE, C, RL, AP first Model is evaluated;Then the model performance whether there is or not self-encoding encoder is analyzed respectively, by the effect with other models into Row comparison, embodies the validity of institute's climbing form type;The image retrieval and mark of cross datasets are finally carried out between two datasets, Embody the generalization of institute's climbing form type.
A, the model performance of multi-tag classification angle
From multi-tag classification angle, the multi-tags such as HL, OE, C, RL, AP point are used on Natural Scenes data set Class index evaluates model.Binary coding feature and non-binary code feature are verified respectively, compared Method is current popular multi-tag learning method, and such as ML-KNN, ML-I2C, InsDif and ML-LI2C, table 2 is illustrated The result of model.It can be seen that the model proposed is all improved in 5 all indexs compared to model before, and And the effect that two kinds of features obtain is similar.
The Contrast on effect of 2 climbing form types of table and other methods on Natural Scenes data set
Method HL↓ OE↓ C↓ RL↓ AP↑
ML-KNN 0.169 0.3 0.93 0.168 0.80
ML-I2C 0.159 0.311 90.88 0.156 30.80
InsDif 0.152 0.259 30.83 0.14 40.83
ML-LI2C 0.129 0.19 40.62 0.091 0.88
InceptionV3 0.101 0.15 40.55 0.076 10.90
Inception 0.107 0.157 30.56 0.08 80.90
B, the model performance on multiple data sets
This part is by the way that in Corel5K, ESP-Game is real on IAPRTC-12 data set with other image labeling methods It tests and compares to verify the validity of our methods, index P, R, F1 and N+.
It is tested on two lesser data sets of picture amount of Natural Scenes and Corel5K first, with CNN+ Softmax method is compared.In order to accurately compare, the feature of use is all directly extracted from institute's climbing form type. It can see from Fig. 4, the mark effect of both of which ratio CNN+Softmax of the invention is more preferable, and CNN-TagProp effect It is better than CNN-TagProp (256bit).Specifically, the F1 value of CNN-Sigmoid method exists compared to CNN-Softmax method 6% and 8% have been respectively increased in two datasets.It is effective that this, which illustrates that last loss function is changed to sigmoid by the present invention, , sigmoid is more suitable for multi-tag compared to softmax and marks.The high-level semantic for also embodying mentioned model extraction simultaneously is special Sign has preferable discrimination, is conducive to image labeling.
Embodiment described above is only to absolutely prove preferred embodiment that is of the invention and being lifted, protection model of the invention It encloses without being limited thereto.Those skilled in the art's made equivalent substitute or transformation on the basis of the present invention, in the present invention Protection scope within.Protection scope of the present invention is subject to claims.

Claims (6)

1. a kind of image labeling method based on convolutional neural networks and binary coding feature, which is characterized in that including following Step:
Construct Incepiton V3 basic network model;
The last pond layer of the Incepiton V3 network foundation model is intercepted, the Incepiton V3 network foundation is removed Logits the and softmax function of model, and sigmoid function is used to be modified as the activation primitive of the last layer First foundation network model afterwards;
It added two layers of full articulamentum on the first foundation network model, and use sigmoid function as last The activation primitive of layer obtains multi-tag sorter network model;
Study is trained to training set using the multi-tag sorter network model, optimizes the multi-tag sorter network model Weight;
Based on the set of eigenvectors of trained multi-tag sorter network model label target image, the target image is obtained Multi-tag probability output;
In conjunction with the multi-tag probability output, the target image is labeled using TagProp algorithm.
2. image labeling method as described in claim 1, which is characterized in that described " to use the multi-tag sorter network mould Type is trained study to training set, optimizes the weight of the multi-tag sorter network model ", it specifically includes:
Study is trained to training set using the multi-tag sorter network model, obtains loss function;
Training is finely adjusted to the multi-tag sorter network model according to described;Wherein, the fine tuning training specifically includes: Gu The weight of convolutional layer before fixed two layers of full articulamentum optimizes two layers of full articulamentum by backpropagation training.
3. image labeling method as described in claim 1, which is characterized in that described " in conjunction with the multi-tag probability output, to adopt The target image is labeled with TagProp algorithm ", it specifically includes:
For target image x, possess j-th of label i.e. yj=+1 probability are as follows:
Wherein, πiIndicate the weight of prediction label, p (yj=+1 | xi) indicate in xiUnder conditions of Target Photo have j-th of label Probability, it may be assumed that
Wherein ε is predetermined value;
It is solved by maximizing the log-likelihood of label in training set, then the loss function of model are as follows:
L=∑jcjlog p(yj),
Wherein, parameter cjFor measuring the loss that image X belongs to label j.
4. image labeling method as claimed in claim 3, which is characterized in that calculate weight π based on the method for distancei, i.e.,
Wherein, dhi(x,xi)=hid(x,xi), d (x, xi) it is x and xiFundamental distance.
5. image labeling method as claimed in claim 3, which is characterized in that use sigmoid function pairIt improves, i.e. p (yj=+1)=σ (αjzjj), wherein αjFor weight, βjTo bias, zjIt is the weighted average of label j in the neighbour of target image X, i.e. zj=∑iπiyj
6. image labeling method as claimed in claim 3, which is characterized in that " the parameter cjBelong to mark for measuring image X Sign the loss of j ", it specifically includes:
Work as yjWhen=+ 1, cj=1/N+;Work as yjWhen=- 1, cj=1/N-;Wherein N+ indicates the picture for belonging to label j in training set Number, N- indicate the number that the picture of label is not belonging in training set, i.e.,
CN201910791484.9A 2019-08-26 2019-08-26 Image labeling method based on convolutional neural networks and binary coding feature Pending CN110516098A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910791484.9A CN110516098A (en) 2019-08-26 2019-08-26 Image labeling method based on convolutional neural networks and binary coding feature

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910791484.9A CN110516098A (en) 2019-08-26 2019-08-26 Image labeling method based on convolutional neural networks and binary coding feature

Publications (1)

Publication Number Publication Date
CN110516098A true CN110516098A (en) 2019-11-29

Family

ID=68627926

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910791484.9A Pending CN110516098A (en) 2019-08-26 2019-08-26 Image labeling method based on convolutional neural networks and binary coding feature

Country Status (1)

Country Link
CN (1) CN110516098A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111382800A (en) * 2020-03-11 2020-07-07 上海爱数信息技术股份有限公司 Multi-label multi-classification method suitable for sample distribution imbalance
CN111639755A (en) * 2020-06-07 2020-09-08 电子科技大学中山学院 Network model training method and device, electronic equipment and storage medium
CN112732967A (en) * 2021-01-08 2021-04-30 武汉工程大学 Automatic image annotation method and system and electronic equipment
CN112766330A (en) * 2021-01-07 2021-05-07 济南浪潮高新科技投资发展有限公司 Image multi-label classification method and device
CN113096080A (en) * 2021-03-30 2021-07-09 四川大学华西第二医院 Image analysis method and system
CN114139656A (en) * 2022-01-27 2022-03-04 成都橙视传媒科技股份公司 Image classification method based on deep convolution analysis and broadcast control platform
CN114550916A (en) * 2022-02-23 2022-05-27 天津大学 Device for classifying, identifying and positioning common lung diseases based on deep learning

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108416384A (en) * 2018-03-05 2018-08-17 苏州大学 A kind of image tag mask method, system, equipment and readable storage medium storing program for executing
CN110163234A (en) * 2018-10-10 2019-08-23 腾讯科技(深圳)有限公司 A kind of model training method, device and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108416384A (en) * 2018-03-05 2018-08-17 苏州大学 A kind of image tag mask method, system, equipment and readable storage medium storing program for executing
CN110163234A (en) * 2018-10-10 2019-08-23 腾讯科技(深圳)有限公司 A kind of model training method, device and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
XINJIAN WU等: ""A_Novel_Model_for_Multi-label_Image_Annotation "", 《2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR)》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111382800A (en) * 2020-03-11 2020-07-07 上海爱数信息技术股份有限公司 Multi-label multi-classification method suitable for sample distribution imbalance
CN111382800B (en) * 2020-03-11 2022-11-25 上海爱数信息技术股份有限公司 Multi-label multi-classification method suitable for sample distribution imbalance
CN111639755A (en) * 2020-06-07 2020-09-08 电子科技大学中山学院 Network model training method and device, electronic equipment and storage medium
CN111639755B (en) * 2020-06-07 2023-04-25 电子科技大学中山学院 Network model training method and device, electronic equipment and storage medium
CN112766330A (en) * 2021-01-07 2021-05-07 济南浪潮高新科技投资发展有限公司 Image multi-label classification method and device
CN112732967A (en) * 2021-01-08 2021-04-30 武汉工程大学 Automatic image annotation method and system and electronic equipment
CN112732967B (en) * 2021-01-08 2022-04-29 武汉工程大学 Automatic image annotation method and system and electronic equipment
CN113096080A (en) * 2021-03-30 2021-07-09 四川大学华西第二医院 Image analysis method and system
CN113096080B (en) * 2021-03-30 2024-01-16 四川大学华西第二医院 Image analysis method and system
CN114139656A (en) * 2022-01-27 2022-03-04 成都橙视传媒科技股份公司 Image classification method based on deep convolution analysis and broadcast control platform
CN114550916A (en) * 2022-02-23 2022-05-27 天津大学 Device for classifying, identifying and positioning common lung diseases based on deep learning

Similar Documents

Publication Publication Date Title
CN110516098A (en) Image labeling method based on convolutional neural networks and binary coding feature
Yu et al. Spatial pyramid-enhanced NetVLAD with weighted triplet loss for place recognition
CN110298037B (en) Convolutional neural network matching text recognition method based on enhanced attention mechanism
CN110059198B (en) Discrete hash retrieval method of cross-modal data based on similarity maintenance
CN106021364B (en) Foundation, image searching method and the device of picture searching dependency prediction model
CN1307579C (en) Methods and apparatus for classifying text and for building a text classifier
Zhao et al. Large-scale category structure aware image categorization
Perez-Martin et al. Improving video captioning with temporal composition of a visual-syntactic embedding
CN105393264A (en) Interactive segment extraction in computer-human interactive learning
CN107220373A (en) A kind of Lung neoplasm CT image Hash search methods based on medical science sign and convolutional neural networks
CN108897791B (en) Image retrieval method based on depth convolution characteristics and semantic similarity measurement
CN101561805A (en) Document classifier generation method and system
CN111931505A (en) Cross-language entity alignment method based on subgraph embedding
CN111488917A (en) Garbage image fine-grained classification method based on incremental learning
CN111461175B (en) Label recommendation model construction method and device of self-attention and cooperative attention mechanism
CN105930792A (en) Human action classification method based on video local feature dictionary
CN110598022B (en) Image retrieval system and method based on robust deep hash network
Wang et al. One-shot learning for long-tail visual relation detection
CN110765285A (en) Multimedia information content control method and system based on visual characteristics
CN113032613A (en) Three-dimensional model retrieval method based on interactive attention convolution neural network
CN113806580A (en) Cross-modal Hash retrieval method based on hierarchical semantic structure
Singh et al. Feature selection based classifier combination approach for handwritten Devanagari numeral recognition
CN111144453A (en) Method and equipment for constructing multi-model fusion calculation model and method and equipment for identifying website data
Cheng et al. Deep attentional fine-grained similarity network with adversarial learning for cross-modal retrieval
CN106033546A (en) Behavior classification method based on top-down learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20191129