CN110516098A - Image labeling method based on convolutional neural networks and binary coding feature - Google Patents
Image labeling method based on convolutional neural networks and binary coding feature Download PDFInfo
- Publication number
- CN110516098A CN110516098A CN201910791484.9A CN201910791484A CN110516098A CN 110516098 A CN110516098 A CN 110516098A CN 201910791484 A CN201910791484 A CN 201910791484A CN 110516098 A CN110516098 A CN 110516098A
- Authority
- CN
- China
- Prior art keywords
- image
- tag
- network model
- label
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/217—Validation; Performance evaluation; Active pattern learning techniques
- G06F18/2193—Validation; Performance evaluation; Active pattern learning techniques based on specific statistical tests
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2431—Multiple classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Probability & Statistics with Applications (AREA)
- Library & Information Science (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of image labeling method based on convolutional neural networks and binary coding feature, comprising the following steps: building Incepiton V3 basic network model;Intercept the last pond layer of Incepiton V3 network foundation model, remove Logits the and softmax function of Incepiton V3 network foundation model, and sigmoid function is used as the activation primitive of the last layer, obtains modified first foundation network model;It added two layers of full articulamentum on first foundation network model, and use sigmoid function as the activation primitive of the last layer, obtain multi-tag sorter network model;Study is trained to training set using multi-tag sorter network model, optimizes the weight of multi-tag sorter network model;Based on the set of eigenvectors of trained multi-tag sorter network model label target image, the multi-tag probability output of target image is obtained;In conjunction with multi-tag probability output, target image is labeled using TagProp algorithm.Its multi-tag mark that can be realized image, it is at low cost, it is high-efficient.
Description
Technical field
The present invention relates to visual pattern technical fields, and in particular to one kind is special based on convolutional neural networks and binary coding
The image labeling method of sign.
Background technique
In order to realize that effective management and retrieval of large-scale image, the efficient mark of image seem more important.Image mark
The target of note is to distribute one group of relevant descriptive label for image.Traditional image labeling algorithm requires to take a significant amount of time
Manual extraction characteristics of image, and may take less than good effect, therefore deep learning is applied on image labeling.It is deep
Degree study can obtain the semantic feature of image higher, this reduces the difference with this high level semantic-concept of label.It is based on
The Automatic image annotation algorithm of deep learning is not necessarily to manual extraction characteristics of image, so that dimensioning algorithm is no longer limited by feature extraction
The selection of method, and its training method greatly improves annotating efficiency end to end.Convolutional neural networks are as depth
Popular model in habit has many advantages, such as to be suitble to processing high dimensional data, good classification effect, carries out image labeling using it, can be with
Obtain preferably mark effect.
The work of traditional images mark, which is all based on, to be carried out on the basis of single labeling, every image assume that with
One label correlation.And in a practical situation, an image is often associated with multiple labels, and single label can not completely describe whole
Open image.Current convolutional neural networks model is also all based on single label image classification task building, and loss function is logical
It is often Softmax function, a label of maximum probability can only be assigned to image.It, must in order to be labeled to multi-tag image
More particularly suitable method must be found.In addition, Automatic image annotation algorithm is increasingly focused on arriving millions of images in this way hundreds of thousands of
Large database on application.Therefore, it is necessary to consider time cost, more succinct and efficient character representation method is explored.
Summary of the invention
The technical problem to be solved in the present invention is to provide a kind of figure based on convolutional neural networks and binary coding feature
As mask method, the multi-tag mark of image can be realized, it is at low cost, it is high-efficient.
In order to solve the above-mentioned technical problems, the present invention provides one kind to be based on convolutional neural networks and binary coding feature
Image labeling method, comprising the following steps:
Construct Incepiton V3 basic network model;
The last pond layer of the Incepiton V3 network foundation model is intercepted, the Incepiton V3 network is removed
Logits the and softmax function of basic model, and use sigmoid function as the activation primitive of the last layer, it obtains
Modified first foundation network model;
It added two layers of full articulamentum on the first foundation network model, and use sigmoid function as most
The activation primitive of later layer obtains multi-tag sorter network model;
Study is trained to training set using the multi-tag sorter network model, optimizes the multi-tag sorter network
The weight of model;
Based on the set of eigenvectors of trained multi-tag sorter network model label target image, the target figure is obtained
The multi-tag probability output of picture;
In conjunction with the multi-tag probability output, the target image is labeled using TagProp algorithm.
Preferably, described " be trained study to training set using the multi-tag sorter network model, optimize institute
State the weight of multi-tag sorter network model ", it specifically includes:
Study is trained to training set using the multi-tag sorter network model, obtains loss function;
Training is finely adjusted to the multi-tag sorter network model according to described;Wherein, the specific packet of fine tuning training
Include: the weight of the convolutional layer before fixed two layers of full articulamentum passes through described two layers of optimization full connection of backpropagation training
Layer.
Preferably, described " in conjunction with the multi-tag probability output, using TagProp algorithm to the target image
It is labeled ", it specifically includes:
For target image x, possess j-th of label i.e. yj=+1 probability are as follows:
Wherein, πiIndicate the weight of prediction label, p (yj=+1 | xi) indicate that Target Photo has j-th under conditions of xi
The probability of label, it may be assumed that
Wherein ε is predetermined value;
It is solved by maximizing the log-likelihood of label in training set, then the loss function of model are as follows:
L=∑jcjlogp(yj)
Wherein, parameter cjFor measuring the loss that image X belongs to label j.
Preferably, calculating weight π based on the method for distancei, i.e.,
Wherein, dhi(x, xi)=hiD (x, xi), d (x, xi) be x and xi fundamental distance.
Preferably, using sigmoid function to "" improve, i.e. p (yj=+1)
=σ (αjzj+βj), wherein αjFor weight, βjFor biasing, zjIt is the weighted average of label j in the neighbour of target image X, i.e. zj
=∑iπiyj。
Preferably, " the parameter cjFor measuring the loss that image X belongs to label j ", it specifically includes:
Work as yjWhen=+ 1, cj=1/N+;Work as yjWhen=- 1, cj=1/N-;Wherein N+ indicates to belong to label j's in training set
The number of picture, N- indicate the number that the picture of label is not belonging in training set, i.e.,
Beneficial effects of the present invention: the invention proposes a convolutional neural networks figures based on Sigmoid loss function
As marking model, by one new network model of building and the binary coding feature of image is extracted, then uses TagProp
Algorithm carries out image labeling, can be realized the multi-tag mark of image, at low cost, speed is fast, high-efficient.
Detailed description of the invention
Fig. 1 is the schematic diagram of Inception network model;
Fig. 2 is InceptionV3 schematic network structure;
Fig. 3 is flow diagram of the invention;
Fig. 4 is CNN-Sigmoid and CNN-Softmax contrast schematic diagram, wherein (a) is Natural Scenes data
Experimental result on collection;It (b) is experimental result on Corel5K data set.
Specific embodiment
The present invention will be further explained below with reference to the attached drawings and specific examples, so that those skilled in the art can be with
It more fully understands the present invention and can be practiced, but illustrated embodiment is not as a limitation of the invention.
The characteristics of present invention uses Inception V3 as the basic network topology of model, Inception network is control
While having made calculation amount and parameter amount, extraordinary classification performance is had also obtained.Inception network is not to increase simply
The number of plies of network, it proposes Inception Module, and structure is as shown in Figure 1, be the signal of Inception network model
Figure.This modular design reduces the number of parameters of network, and reduces the design space of network, while increasing network
Thickness.Inception network model also introduces the thought of decomposition, replaces a big convolution using two small convolution, further
Reduce parameter amount, while adding the non-linear of network.
Inception V3 mainly to two aspect be transformed, be on the one hand the introduction of the thought of decomposition, by 7 × 7 this
The big convolution of sample resolves into 1 × 7,7 × 1 two convolution, and 5 × 5 convolution are also similarly decomposed (1 × 5,5 × 1), so both
Reduce number of parameters, and because 1 convolution is splitted into 2 convolution, has deepened the depth of network, enhanced the non-linear of network
With the ability to express of model.It on the other hand is that Inception V3 optimizes Inception module, so that Inception module
There are 35 × 35,17 × 17 and 8 × 8 three kinds of different structures.And these modules only network behind occur, before or commonly
Convolutional layer.As shown in Fig. 2, being InceptionV3 schematic network structure.
Referring to shown in Fig. 3, the invention discloses a kind of image mark based on convolutional neural networks and binary coding feature
Injecting method.
The present invention finely tunes Inception V3 network model.In order to preferably adapt to multi-tag classification, the present invention
The last pond layer of Inception V3 has been intercepted, Logits and softmax function is eliminated, added two layers of full articulamentum,
And use sigmoid function as the activation primitive of the last layer.Two kinds of activation primitives can be applied to situations of classifying more,
The difference is that multiclass may overlap with each other in sigmoid, softmax be then it is all kinds of mutually exclusive, this allows for sigmoid
Function is more suitable for multi-tag classification.
Assuming that there is training set I={ I1,I2,···,IN, Ii∈Rm×n, m and n are the height and width of picture.In multi-tag
In habit, every picture has multiple labels.If the number of tags of data lump is c, the label vector of i-th image is yi.Due to making
With sigmoid function, then the probability that i-th image has j-th of label isWherein fj(Ii) it is net
J-th of unit that the last layer exports in network.For label, if i-th image has j-th of label, yij=1, otherwise yij
=0, then true probability is pij=yij/‖yi‖1.Loss function can be with as a result, is defined as:
In order to accelerate the training speed of model, the present invention is finely adjusted training to network model, i.e., fixed front convolutional layer
Weight, pass through two layers of full articulamentum adding of backpropagation training optimization.
The present invention extracts the feature X={ x1, x2, xN } of the output vector of full articulamentum as image, corresponding
Tag set is Y={ y1, y2, yN }.
For the high-level semantics features for preferably CNN being utilized to extract, the present invention applies features to TagProp image labeling
In model.TagProp is the model based on arest neighbors, the method that it uses neighbour's ballot, it is contemplated that with target image vision
Different weights should be occupied in ballot apart from different pictures, it is a series of that the acceptance of the bid of final goal image, which checks out existing probability,
Weighted sum.
The condition that TagProp learns the label of given image by calculating pairwise distance is distributed.For target image x,
It possesses j-th of label i.e. yj=+1 probability are as follows:
Wherein π i indicates that sample xi is the weight of x prediction label, p (yj=+1 | xi) indicate target figure under conditions of xi
Piece has the probability of j-th of label, is defined as:
Wherein ε is the value of very little, is 0 to avoid probability.We are using weight π i is calculated based on the method for distance, i.e.,
Wherein dhi(x, xi)=hiD (x, xi).D (x, xi) it is x and xiFundamental distance, the present invention use Euclidean distance come
Measure the distance of two images, hi>=0 can optimize.The probability occurred in target image x by calculating all labels,
Take wherein the maximum k label of probability value as final mark.
It is solved by maximizing the log-likelihood of label in training set, then the loss function of model are as follows:
L=Σjcjlogp(yj) (formula 5)
Parameter ciIt is different to allow for the picture number that each tag concept includes, for measuring the damage that image X belongs to label j
It loses.Specifically, work as yjWhen=+ 1, cj=1/N+;Work as yjWhen=- 1, cj=1/N-.Wherein N+, N-It respectively indicates in training set and belongs to
In the number with the picture for being not belonging to label j, i.e.,Limitation h in element be all it is non-negative, utilize throwing
The value of shadow gradient algorithm solution h.
It will be understood that semantic concept lower for those frequencies of occurrences, even if occurring in neighbour several times, also only
It can obtain lower prediction probability, therefore be had on the semantic concept that these occupy picture rareness by the above method lower
Recall value.Therefore to y predicted abovej=+1 (formula 3) improves, and is carried out smoothly, i.e., using sigmoid function
p(yj=+1)=σ (αjzj+βj) (formula 6)
zjIt is the weighted average of label j in the neighbour of target image X, i.e.,For weight, βjIt is inclined
It sets.Here sigmoid function is the weight enhancing in order to make the label of relative rarity, to weaken the higher label of the frequency of occurrences
Weight.
Based on above-mentioned image labeling method, specific practical different data collection carries out test analysis, and situation is as follows:
The present invention is assessed on Natural Scenes, Corel5K, ESP-Game, multiple data sets such as IAPRTC-12 and is mentioned
The validity of model out, each data set information are shown in Table 1.
These data sets are briefly introduced separately below:
(1) data set
The present invention is assessed on Natural Scenes, Corel5K, ESP-Game, multiple data sets such as IAPRTC-12 and is mentioned
The validity of model out, each data set information are shown in Table 1.These data sets are briefly introduced separately below.
Each data set information of table 1
Image set | Picture number | Label | Average label | Training image number | Test image number |
Natural | 2000 | 5 | 2.3 | 1500 | 500 |
Corel5K | 5000 | 260 | 3.4 | 4000 | 1000 |
ESP-Game | 20000 | 268 | 4.7 | 15000 | 5000 |
IAPRTC 12 | 19627 | 291 | 5.7 | 15000 | 4627 |
Natural Scenes data set scale is smaller, there is 2000 images, and it is sunset respectively that these images, which have been divided into 5 classes,
Sun, desert, forest, ocean and mountain, and every image has 1~2 label.
The picture amount of Corel-5K data set is medium, it one shares 5000 images, contain weather, scenery, building and
260 kinds of labels such as vehicles, every image has 1~5 label, and the average number of tags of every image is 3.5.Due to
The label information of Corel-5K data set is all more accurate, and often it is often used in the classification experiments of various multi-tag images.
ESP-Game image set is larger, and one shares 20770 images.It one shares 268 kinds of labels, and label covers
Capping is very wide, including drawing, building, animal etc..Every image has 1~15 label, and average every image has 4.6.
ESP-Game data set picture amount is larger, and there are some error labels, so this chapter has done some processing to it, has gone some
The image of label information inaccuracy shares 20000 eventually for the image of experiment.
IAPRTC-12 is equally the more data set of a picture amount, altogether includes 19627 width images.Its label is
With the sentence with practical significance of various language descriptions.Major terms can be extracted by using natural language processing technique,
It is converted into format similar with other data sets.Eventually passing through statistics, it one shares 291 kinds of marks, and every image is averaged
Number of tags is 5.7.And the large percentage that the mankind's image caning be found that in IAPRTC-12 image accounts for.
(2) experimental setup and evaluation index
It tests and is carried out on the server equipped with GPU, server system is Ubuntu 16.04, is furnished with 2 pieces of NVIDIA
GeForce TITAN video card.Experiment uses TensorFlow deep learning frame, programming language Python.
As said before, the network model that this model uses is the Inception V3 net that pre-training is good on ImageNet
Network.When finely tuning to the network model, to the two lesser data sets of Natural Scenes and Corel-5K, learning rate is set
It is set to 0.0001, for the two biggish data sets of ESP-Game and IAPRTC-12, learning rate is set as 0.0005.Learning rate
Exponential damping to be all set as 0.99995, mini-batch be all 32, dropout is 0.5.In addition, constructing k set of tags
At image tag candidate collection when, 5, ESP- are taken for Natural Scenes data set k=2, Corel-5K data set k
Game data set k takes 6, and IAPRTC-12 data set k takes 7.
Icon mark belongs to multi-tag study, therefore present invention introduces the evaluation indexes of some multi-tags classification.For one
A test set S={ (x1, Y1), (x2, Y2) ..., (xp, Yp), wherein Y is tag set.
1, Hamming loss (HL):Wherein Q is of label in sample set
Number, h (xi) it is expressed as the tag set of sample i prediction, Δ is xor operation.HL can be used to assess a sample more by mistake point
Few time, for example, a sample is not belonging to label A but is divided into label A by mistake, either, a sample belongs to label A, still
It is not predicted to be label A.It may also be said that calculated with hamming loss result sequence that classifier predicts and result sequence it
Between distance numerically.HL value is smaller, and prediction result is better.
2, One-error (OE):
f(xi, y) and it is prediction score of the sample i for label y.What OE was indicated is the label for exporting highest scoring in result
The not probability in true tag set.Therefore OE value is smaller, and prediction result is better.
3, Coverage (C):rankf(xi, y) and table
Showing and is ranked up according to the probability of sample label prediction, true tag also and then sorts,It indicates
The position for the label that the last one label is 1 in true tag sequence after drained sequence.It is averagely also poor more that coverage rate evaluates us
Far, indicate in all documents that (rank is since 1 ing, so back subtracts for the sequence average value of true tag of the sequence after
It is a 1), it is better with the smaller performance of sample value.
4, Ranking loss (RL):
F is pre-
Survey function, Y-I is the supplementary set of Yi, | Yi | indicate the quantity of sample i physical tags.Sequence loss is used to indicate in sort result
In, the sample for being not belonging to respective labels concentration has been come being averaged for the probability for belonging to respective labels concentration sample.RL value is smaller,
Prediction result is better.
5, Average precision (AP):。
It illustrates that for each prediction result, the label of prediction is correct and the forward probability that sorts in result set.AP value
Bigger, prediction effect is better.
In addition to this, the present invention additionally uses most common several indexs to measure the performance of image labeling method, respectively
It is accuracy rate P (Precision), recall rate R (Recall), F1 value and N+.For a certain label i, accuracy rate calculate be by
Ratio of the image correctly marked in the actually image that mark, recall rate calculating is the image that is correctly marked should
The ratio in image being marked.Assuming that marking correct picture number and beingAll picture numbers retrieved areTest set
In all picture numbers relevant to the keyword beThen have
F1 value is to balance the index of accuracy rate and recall rate, is hadIn addition, N+ indicates correct in all labels
The number of labels of mark, this index reflect algorithm to the level of coverage of label.
(3) experimental result
In order to verify the performance of proposed image labeling model, the present embodiment is right in the indexs such as HL, OE, C, RL, AP first
Model is evaluated;Then the model performance whether there is or not self-encoding encoder is analyzed respectively, by the effect with other models into
Row comparison, embodies the validity of institute's climbing form type;The image retrieval and mark of cross datasets are finally carried out between two datasets,
Embody the generalization of institute's climbing form type.
A, the model performance of multi-tag classification angle
From multi-tag classification angle, the multi-tags such as HL, OE, C, RL, AP point are used on Natural Scenes data set
Class index evaluates model.Binary coding feature and non-binary code feature are verified respectively, compared
Method is current popular multi-tag learning method, and such as ML-KNN, ML-I2C, InsDif and ML-LI2C, table 2 is illustrated
The result of model.It can be seen that the model proposed is all improved in 5 all indexs compared to model before, and
And the effect that two kinds of features obtain is similar.
The Contrast on effect of 2 climbing form types of table and other methods on Natural Scenes data set
Method | HL↓ | OE↓ | C↓ | RL↓ | AP↑ |
ML-KNN | 0.169 | 0.3 | 0.93 | 0.168 | 0.80 |
ML-I2C | 0.159 | 0.311 | 90.88 | 0.156 | 30.80 |
InsDif | 0.152 | 0.259 | 30.83 | 0.14 | 40.83 |
ML-LI2C | 0.129 | 0.19 | 40.62 | 0.091 | 0.88 |
InceptionV3 | 0.101 | 0.15 | 40.55 | 0.076 | 10.90 |
Inception | 0.107 | 0.157 | 30.56 | 0.08 | 80.90 |
B, the model performance on multiple data sets
This part is by the way that in Corel5K, ESP-Game is real on IAPRTC-12 data set with other image labeling methods
It tests and compares to verify the validity of our methods, index P, R, F1 and N+.
It is tested on two lesser data sets of picture amount of Natural Scenes and Corel5K first, with CNN+
Softmax method is compared.In order to accurately compare, the feature of use is all directly extracted from institute's climbing form type.
It can see from Fig. 4, the mark effect of both of which ratio CNN+Softmax of the invention is more preferable, and CNN-TagProp effect
It is better than CNN-TagProp (256bit).Specifically, the F1 value of CNN-Sigmoid method exists compared to CNN-Softmax method
6% and 8% have been respectively increased in two datasets.It is effective that this, which illustrates that last loss function is changed to sigmoid by the present invention,
, sigmoid is more suitable for multi-tag compared to softmax and marks.The high-level semantic for also embodying mentioned model extraction simultaneously is special
Sign has preferable discrimination, is conducive to image labeling.
Embodiment described above is only to absolutely prove preferred embodiment that is of the invention and being lifted, protection model of the invention
It encloses without being limited thereto.Those skilled in the art's made equivalent substitute or transformation on the basis of the present invention, in the present invention
Protection scope within.Protection scope of the present invention is subject to claims.
Claims (6)
1. a kind of image labeling method based on convolutional neural networks and binary coding feature, which is characterized in that including following
Step:
Construct Incepiton V3 basic network model;
The last pond layer of the Incepiton V3 network foundation model is intercepted, the Incepiton V3 network foundation is removed
Logits the and softmax function of model, and sigmoid function is used to be modified as the activation primitive of the last layer
First foundation network model afterwards;
It added two layers of full articulamentum on the first foundation network model, and use sigmoid function as last
The activation primitive of layer obtains multi-tag sorter network model;
Study is trained to training set using the multi-tag sorter network model, optimizes the multi-tag sorter network model
Weight;
Based on the set of eigenvectors of trained multi-tag sorter network model label target image, the target image is obtained
Multi-tag probability output;
In conjunction with the multi-tag probability output, the target image is labeled using TagProp algorithm.
2. image labeling method as described in claim 1, which is characterized in that described " to use the multi-tag sorter network mould
Type is trained study to training set, optimizes the weight of the multi-tag sorter network model ", it specifically includes:
Study is trained to training set using the multi-tag sorter network model, obtains loss function;
Training is finely adjusted to the multi-tag sorter network model according to described;Wherein, the fine tuning training specifically includes: Gu
The weight of convolutional layer before fixed two layers of full articulamentum optimizes two layers of full articulamentum by backpropagation training.
3. image labeling method as described in claim 1, which is characterized in that described " in conjunction with the multi-tag probability output, to adopt
The target image is labeled with TagProp algorithm ", it specifically includes:
For target image x, possess j-th of label i.e. yj=+1 probability are as follows:
Wherein, πiIndicate the weight of prediction label, p (yj=+1 | xi) indicate in xiUnder conditions of Target Photo have j-th of label
Probability, it may be assumed that
Wherein ε is predetermined value;
It is solved by maximizing the log-likelihood of label in training set, then the loss function of model are as follows:
L=∑jcjlog p(yj),
Wherein, parameter cjFor measuring the loss that image X belongs to label j.
4. image labeling method as claimed in claim 3, which is characterized in that calculate weight π based on the method for distancei, i.e.,
Wherein, dhi(x,xi)=hid(x,xi), d (x, xi) it is x and xiFundamental distance.
5. image labeling method as claimed in claim 3, which is characterized in that use sigmoid function pairIt improves, i.e. p (yj=+1)=σ (αjzj+βj), wherein αjFor weight, βjTo bias,
zjIt is the weighted average of label j in the neighbour of target image X, i.e. zj=∑iπiyj。
6. image labeling method as claimed in claim 3, which is characterized in that " the parameter cjBelong to mark for measuring image X
Sign the loss of j ", it specifically includes:
Work as yjWhen=+ 1, cj=1/N+;Work as yjWhen=- 1, cj=1/N-;Wherein N+ indicates the picture for belonging to label j in training set
Number, N- indicate the number that the picture of label is not belonging in training set, i.e.,
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910791484.9A CN110516098A (en) | 2019-08-26 | 2019-08-26 | Image labeling method based on convolutional neural networks and binary coding feature |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910791484.9A CN110516098A (en) | 2019-08-26 | 2019-08-26 | Image labeling method based on convolutional neural networks and binary coding feature |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110516098A true CN110516098A (en) | 2019-11-29 |
Family
ID=68627926
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910791484.9A Pending CN110516098A (en) | 2019-08-26 | 2019-08-26 | Image labeling method based on convolutional neural networks and binary coding feature |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110516098A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111382800A (en) * | 2020-03-11 | 2020-07-07 | 上海爱数信息技术股份有限公司 | Multi-label multi-classification method suitable for sample distribution imbalance |
CN111639755A (en) * | 2020-06-07 | 2020-09-08 | 电子科技大学中山学院 | Network model training method and device, electronic equipment and storage medium |
CN112732967A (en) * | 2021-01-08 | 2021-04-30 | 武汉工程大学 | Automatic image annotation method and system and electronic equipment |
CN112766330A (en) * | 2021-01-07 | 2021-05-07 | 济南浪潮高新科技投资发展有限公司 | Image multi-label classification method and device |
CN113096080A (en) * | 2021-03-30 | 2021-07-09 | 四川大学华西第二医院 | Image analysis method and system |
CN114139656A (en) * | 2022-01-27 | 2022-03-04 | 成都橙视传媒科技股份公司 | Image classification method based on deep convolution analysis and broadcast control platform |
CN114550916A (en) * | 2022-02-23 | 2022-05-27 | 天津大学 | Device for classifying, identifying and positioning common lung diseases based on deep learning |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108416384A (en) * | 2018-03-05 | 2018-08-17 | 苏州大学 | A kind of image tag mask method, system, equipment and readable storage medium storing program for executing |
CN110163234A (en) * | 2018-10-10 | 2019-08-23 | 腾讯科技(深圳)有限公司 | A kind of model training method, device and storage medium |
-
2019
- 2019-08-26 CN CN201910791484.9A patent/CN110516098A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108416384A (en) * | 2018-03-05 | 2018-08-17 | 苏州大学 | A kind of image tag mask method, system, equipment and readable storage medium storing program for executing |
CN110163234A (en) * | 2018-10-10 | 2019-08-23 | 腾讯科技(深圳)有限公司 | A kind of model training method, device and storage medium |
Non-Patent Citations (1)
Title |
---|
XINJIAN WU等: ""A_Novel_Model_for_Multi-label_Image_Annotation "", 《2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR)》 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111382800A (en) * | 2020-03-11 | 2020-07-07 | 上海爱数信息技术股份有限公司 | Multi-label multi-classification method suitable for sample distribution imbalance |
CN111382800B (en) * | 2020-03-11 | 2022-11-25 | 上海爱数信息技术股份有限公司 | Multi-label multi-classification method suitable for sample distribution imbalance |
CN111639755A (en) * | 2020-06-07 | 2020-09-08 | 电子科技大学中山学院 | Network model training method and device, electronic equipment and storage medium |
CN111639755B (en) * | 2020-06-07 | 2023-04-25 | 电子科技大学中山学院 | Network model training method and device, electronic equipment and storage medium |
CN112766330A (en) * | 2021-01-07 | 2021-05-07 | 济南浪潮高新科技投资发展有限公司 | Image multi-label classification method and device |
CN112732967A (en) * | 2021-01-08 | 2021-04-30 | 武汉工程大学 | Automatic image annotation method and system and electronic equipment |
CN112732967B (en) * | 2021-01-08 | 2022-04-29 | 武汉工程大学 | Automatic image annotation method and system and electronic equipment |
CN113096080A (en) * | 2021-03-30 | 2021-07-09 | 四川大学华西第二医院 | Image analysis method and system |
CN113096080B (en) * | 2021-03-30 | 2024-01-16 | 四川大学华西第二医院 | Image analysis method and system |
CN114139656A (en) * | 2022-01-27 | 2022-03-04 | 成都橙视传媒科技股份公司 | Image classification method based on deep convolution analysis and broadcast control platform |
CN114550916A (en) * | 2022-02-23 | 2022-05-27 | 天津大学 | Device for classifying, identifying and positioning common lung diseases based on deep learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110516098A (en) | Image labeling method based on convolutional neural networks and binary coding feature | |
Yu et al. | Spatial pyramid-enhanced NetVLAD with weighted triplet loss for place recognition | |
CN110298037B (en) | Convolutional neural network matching text recognition method based on enhanced attention mechanism | |
CN110059198B (en) | Discrete hash retrieval method of cross-modal data based on similarity maintenance | |
CN106021364B (en) | Foundation, image searching method and the device of picture searching dependency prediction model | |
CN1307579C (en) | Methods and apparatus for classifying text and for building a text classifier | |
Zhao et al. | Large-scale category structure aware image categorization | |
Perez-Martin et al. | Improving video captioning with temporal composition of a visual-syntactic embedding | |
CN105393264A (en) | Interactive segment extraction in computer-human interactive learning | |
CN107220373A (en) | A kind of Lung neoplasm CT image Hash search methods based on medical science sign and convolutional neural networks | |
CN108897791B (en) | Image retrieval method based on depth convolution characteristics and semantic similarity measurement | |
CN101561805A (en) | Document classifier generation method and system | |
CN111931505A (en) | Cross-language entity alignment method based on subgraph embedding | |
CN111488917A (en) | Garbage image fine-grained classification method based on incremental learning | |
CN111461175B (en) | Label recommendation model construction method and device of self-attention and cooperative attention mechanism | |
CN105930792A (en) | Human action classification method based on video local feature dictionary | |
CN110598022B (en) | Image retrieval system and method based on robust deep hash network | |
Wang et al. | One-shot learning for long-tail visual relation detection | |
CN110765285A (en) | Multimedia information content control method and system based on visual characteristics | |
CN113032613A (en) | Three-dimensional model retrieval method based on interactive attention convolution neural network | |
CN113806580A (en) | Cross-modal Hash retrieval method based on hierarchical semantic structure | |
Singh et al. | Feature selection based classifier combination approach for handwritten Devanagari numeral recognition | |
CN111144453A (en) | Method and equipment for constructing multi-model fusion calculation model and method and equipment for identifying website data | |
Cheng et al. | Deep attentional fine-grained similarity network with adversarial learning for cross-modal retrieval | |
CN106033546A (en) | Behavior classification method based on top-down learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191129 |