CN109165666A - Multi-tag image classification method, device, equipment and storage medium - Google Patents
Multi-tag image classification method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN109165666A CN109165666A CN201810735861.2A CN201810735861A CN109165666A CN 109165666 A CN109165666 A CN 109165666A CN 201810735861 A CN201810735861 A CN 201810735861A CN 109165666 A CN109165666 A CN 109165666A
- Authority
- CN
- China
- Prior art keywords
- prediction result
- image
- classification
- dimension
- feature image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
Multi-tag image classification method, device, equipment and storage medium provided by the invention, belong to technical field of image processing.The multi-tag image classification method includes: the fisrt feature image for extracting image to be processed;First dimension-reduction treatment is carried out to fisrt feature image and the first classification processing generates the first labeling prediction result;Feature extraction is carried out to fisrt feature image, generates second feature image;Second dimension-reduction treatment is carried out to second feature image and the second classification processing generates the second labeling prediction result, the second labeling prediction result is used to indicate the second classification results of each class label;The target prediction of image to be processed is determined according to the first labeling prediction result and the second labeling prediction result as a result, to improve the precision of multi-tag image classification.
Description
Technical field
The present invention relates to field of image processings, in particular to multi-tag image classification method, device, equipment and deposit
Storage media.
Background technique
Multi-tag image classification (multi-label classification) is an important research in computer vision
Project, the development of arriving and depth learning technology in particular with big data era, image classification obtain more and more close
Note.However common image classification only needs to each image classification to be a label, and multi-tag classification is then needed to each
The target for including in image is correctly classified, and the size of the target of different labels in the picture is not quite similar, each image tag number
Amount is also not fixed, and brings great difficulty to multi-tag classification.And traditional problem is mostly used to convert in existing research at present
(algorithm adaptation) method is transformed to solve multi-tag image in (problem transformation) and algorithm
Classification problem, but the unsuitable processing data diversity of these conventional sorting methods is high, the multi-tag image point more than classification number
Class problem can not accurately realize that multi-tag is classified.
Summary of the invention
Multi-tag image classification method, device, equipment and storage medium provided in an embodiment of the present invention, can solve existing
The technical issues of precision that can not improve multi-tag image classification in technology.
To achieve the goals above, technical solution used in the embodiment of the present invention is as follows:
In a first aspect, a kind of multi-tag image classification method provided in an embodiment of the present invention, comprising: extract image to be processed
Fisrt feature image;First dimension-reduction treatment is carried out to the fisrt feature image and the first classification processing generates the first label point
Class prediction result, the first labeling prediction result are used to indicate the first classification results of each class label;To described
Fisrt feature image carries out feature extraction, generates second feature image;Second dimension-reduction treatment is carried out to the second feature image
The second labeling prediction result is generated with the second classification processing, the second labeling prediction result is for indicating described every
Second classification results of a class label;It is tied according to the first labeling prediction result and second labeling prediction
Fruit determines the target prediction result of the image to be processed.
With reference to first aspect, the embodiment of the invention provides the first possible embodiment of first aspect, described
The dimension of one characteristic image is the first dimension, described to carry out the first dimension-reduction treatment and the first classification to the fisrt feature image
Processing generates the first labeling prediction result, comprising: pond processing is carried out to the fisrt feature image, to obtain the second dimension
The feature vector of degree, second dimension are less than first dimension;By the way that the described eigenvector of second dimension is defeated
Enter the first full articulamentum and carry out classification processing, generates the first labeling prediction result.
The possible embodiment of with reference to first aspect the first, the embodiment of the invention provides second of first aspect
Possible embodiment, it is described that pond processing is carried out to the fisrt feature image, to obtain the feature vector of the second dimension,
It include: that default pond function is determined according to maximum pond function and average pond function;By the default pond function to institute
It states fisrt feature image and does pondization processing, to obtain the described eigenvector of second dimension.
With reference to first aspect, described the embodiment of the invention provides the third possible embodiment of first aspect
Feature extraction is carried out to the fisrt feature image, generates second feature image, comprising: the class label number based on default classification
Feature extraction is carried out to the fisrt feature image, generates the second feature image of third dimension, described third dimension etc.
In the product of the class label number and preset constant, and the third dimension is less than the first dimension of the fisrt feature image
Degree.
The third possible embodiment with reference to first aspect, the embodiment of the invention provides the 4th kind of first aspect
Possible embodiment, it is described that second dimension-reduction treatment and the second classification processing generation second are carried out to the second feature image
Labeling prediction result, comprising: pond processing is carried out to the second feature image, to obtain and the class label number phase
With the feature vector of dimension;By will be inputted with the described eigenvector of the class label number identical dimensional
Second full articulamentum carries out classification processing, generates the second labeling prediction result.
With reference to first aspect, described the embodiment of the invention provides the 5th kind of possible embodiment of first aspect
Target prediction result is determined according to the first labeling prediction result and the second labeling prediction result, comprising:
Determine that the average value of the first labeling prediction result and the second labeling prediction result is pre- as the target
Survey result.
With reference to first aspect, the embodiment of the invention provides the 6th kind of possible embodiment of first aspect, the sides
Method further include: the accuracy rate of the target prediction result is determined based on preset rules.
The 6th kind of possible embodiment with reference to first aspect, the embodiment of the invention provides the 7th kind of first aspect
Possible embodiment, the accuracy rate that the target prediction result is determined based on preset rules, comprising: be based on
Sigmoid function and cross entropy loss function determine the corresponding penalty values of the target prediction result;It is true according to the penalty values
The fixed accuracy rate.
The 7th kind of possible embodiment with reference to first aspect, the embodiment of the invention provides the 8th of first aspect the
The possible embodiment of kind, it is described that the target prediction result pair is determined based on sigmoid function and cross entropy loss function
The penalty values answered, comprising: corresponding first classification value of the target prediction result is calculated according to the sigmoid function;According to
The cross entropy loss function calculates the corresponding penalty values of first classification value.
Second aspect, a kind of multi-tag image classification device provided in an embodiment of the present invention, comprising: the first extraction module,
For extracting the fisrt feature image of image to be processed;First processing module, for carrying out first to the fisrt feature image
Dimension-reduction treatment and the first classification processing generate the first labeling prediction result, and the first labeling prediction result is used for table
Show the first classification results of each class label;Second extraction module, for carrying out feature extraction to the fisrt feature image,
Generate second feature image;Second processing module, for carrying out the second dimension-reduction treatment and second point to the second feature image
Class processing generates the second labeling prediction result, and the second labeling prediction result is for indicating each classification mark
Second classification results of label;Third processing module, for according to the first labeling prediction result and second label
Classification prediction result determines the target prediction result of the image to be processed.
The third aspect, a kind of terminal device provided in an embodiment of the present invention, comprising: memory, processor and be stored in
In the memory and the computer program that can run on the processor, when the processor executes the computer program
It realizes as described in any one of first aspect the step of multi-tag image classification method.
Fourth aspect, a kind of storage medium provided in an embodiment of the present invention are stored with instruction on the storage medium, work as institute
Instruction is stated when running on computers, so that the computer executes such as the described in any item multi-tag image classifications of first aspect
Method.
Compared with prior art, the embodiment of the present invention bring it is following the utility model has the advantages that
Multi-tag image classification method, device, equipment and storage medium provided in an embodiment of the present invention, by extracting wait locate
The fisrt feature image for managing image carries out the first dimension-reduction treatment to fisrt feature image and the first classification processing generates the first label
Classification prediction result, and feature extraction is carried out to fisrt feature image, generate second feature image;To second feature image into
The second dimension-reduction treatment of row and the second classification processing generate the second labeling prediction result;According to the first labeling prediction result
The target prediction result of image to be processed is determined with the second labeling prediction result.In other words, more in the embodiment of the present invention
On the one hand label image classification method is that will be classified to obtain the first labeling prediction result work by fisrt feature image
For a part of target prediction result, second feature image will be further extracted from fisrt feature image, and special based on second
Sign image is classified, and another part of the second labeling prediction result as target prediction result is obtained, to pass through
Two parallel classification processing branches obtain two classification results, and then comprehensively consider two classification results and obtain target classification knot
Fruit;On the other hand, because further extracting second feature image from fisrt feature image, and divided based on second feature image
Class, so by extracting characteristics of image further to solve the classification of multi-tag asking without noticing of targets multiple and different in image
Topic, by improving the precision of multi-tag image classification in terms of two above.
Other feature and advantage of the disclosure will illustrate in the following description, alternatively, Partial Feature and advantage can be with
Deduce from specification or unambiguously determine, or by implement the disclosure above-mentioned technology it can be learnt that.
To enable the above objects, features and advantages of the present invention to be clearer and more comprehensible, preferred embodiment is cited below particularly, and cooperate
Appended attached drawing, is described in detail below.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below will be to needed in the embodiment attached
Figure is briefly described, it should be understood that the following drawings illustrates only certain embodiments of the present invention, therefore is not construed as pair
The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this
A little attached drawings obtain other relevant attached drawings.
Fig. 1 is the flow chart for the multi-tag image classification method that first embodiment of the invention provides;
Fig. 2 is the network structure flow chart in multi-tag image classification method shown in FIG. 1;
Fig. 3 is the functional block diagram for the multi-tag image classification device that second embodiment of the invention provides;
Fig. 4 is a kind of schematic diagram for terminal device that third embodiment of the invention provides.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art
Every other embodiment obtained without creative efforts, shall fall within the protection scope of the present invention.Therefore,
The model of claimed invention is not intended to limit to the detailed description of the embodiment of the present invention provided in the accompanying drawings below
It encloses, but is merely representative of selected embodiment of the invention.Based on the embodiments of the present invention, those of ordinary skill in the art are not having
Every other embodiment obtained under the premise of creative work is made, shall fall within the protection scope of the present invention.
With reference to the accompanying drawing, it elaborates to some embodiments of the present invention.In the absence of conflict, following
Feature in embodiment and embodiment can be combined with each other.
First embodiment
Since existing multi-tag image classification method is only applicable to image tag low to diversity and that classification number is less
Classify, is precision to improve the classification of processing data high to diversity, more than classification number, the present embodiment provides firstly
A kind of multi-tag image classification method, it should be noted that step shown in the flowchart of the accompanying drawings can be in such as one group of meter
It is executed in the computer system of calculation machine executable instruction, although also, logical order is shown in flow charts, certain
In the case of, it can be with the steps shown or described are performed in an order that is different from the one herein.It is situated between in detail to the present embodiment below
It continues.
Referring to Fig. 1, being the flow chart of multi-tag image classification method provided in an embodiment of the present invention.It below will be to Fig. 1
Shown in detailed process be described in detail.
Step S101 extracts the fisrt feature image of image to be processed.
In embodiments of the present invention, image to be processed can be the image to be processed that user is uploaded with picture format, such as
The formats such as bmp, jpg or png.It can also be the shooting picture of image collecting device (such as camera) capture.Either user is logical
Cross the image to be processed for the picture format that network is downloaded.
Wherein, for ease of description, the characteristic dimension of fisrt feature image is known as the first dimension below.First dimension is for example
It is positive integer for n, n.In general, the size of the image to be processed of the value and input of n, user need classify tag along sort number,
The size of selected convolution kernel (in the case where step S101 is to realize feature extraction by convolutional layer) is related.
As shown in Fig. 2, as an implementation, fisrt feature image is by that can obtain after depth convolutional network Resnet
To corresponding convolution feature.For example, image to be processed is the image of 448*448,2048*14* is obtained by Resnet network
The fisrt feature image of 14 dimensions (the first dimension).
In practice, the feature of image to be processed can also be extracted otherwise, and then obtain fisrt feature
Image for example, extracting the feature of image to be processed by the modes such as VGG network or Inception network, and then obtains the first spy
Levy image.
Step S102 carries out the first dimension-reduction treatment to the fisrt feature image and the first classification processing generates the first label
Classification prediction result, the first labeling prediction result are used to indicate the first classification results of each class label.
As a kind of possible implementation, step S102 includes: to carry out pond processing to fisrt feature image, to obtain
The feature vector of second dimension, wherein the second dimension is less than the first dimension;By the way that the feature vector of the second dimension is inputted first
Full articulamentum carries out classification processing, generates the first labeling prediction result.
Optionally, pond processing is carried out to fisrt feature image, to obtain the feature vector of the second dimension, comprising: according to
Maximum pond function and average pond function determine default pond function, do pond to fisrt feature image by default pond function
Change processing, to obtain the feature vector of the second dimension.Such as the first son is determined according to the first preset constant and maximum pond function
Function;The second subfunction is determined according to the second preset constant and average pond function;According to first subfunction and described the
Two subfunctions determine the default pond function.
Wherein, presetting pond function can indicate are as follows: F=α * FM+β*FA, wherein α and β indicates the parameter that can learn, and leads to
Penalty values (loss) backpropagation for crossing final loss function updates, and meets the default pond function of alpha+beta=1, F expression, FMTable
Show the maximum pond function of expression, for taking maximum to characteristic point in neighborhood.FAAverage pond function is indicated, for special in neighborhood
Sign point is only averaging.α*FMIndicate the first subfunction, β * FAIndicate the second subfunction.
Wherein, the class label number for presetting classification can be that user is set according to actual needs, here, not making specific
It limits.For example, user needs to obtain the class label of three classifications in image to be processed.As in image to be processed comprising people,
The classifications such as doggie, kitten, the sun, but user is only it needs to be determined that these three classifications of doggie, kitten and the sun.
Certainly, the class label number for presetting classification is also possible to default, such as just default includes two classes of people and doggie
Not;It is also possible to determine that all categories on image are the class label number of default classification, if such as on an image to be processed
Four people, doggie, kitten, sun classifications are only included, then it is determined that all categories on image are behaved, doggie, kitten, the sun four
A classification is the class label number of default classification.
It should be noted that the class label number of the default classification in the embodiment of the present invention can be 1,1 can also be greater than.
When the class label number of default classification is greater than 1, the effect of the multi-tag image classification method in the embodiment of the present invention can be brighter
It is aobvious.
In addition, both having included most by obtained feature vector after default pond function pondization processing in the present embodiment
The information of great Chiization also includes the information in average pond, and so as to make full use of the advantage of the two, it is maximum to make up exclusive use
The disadvantage in pond or average pond, and then can effectively solve average pondization loss information and maximum pondization and retain irrelevant information
The problem of, so that obtained first labeling prediction result is more accurate.
Continue for the example in step S101, as illustrated in fig. 2, it is assumed that image to be processed is the image of 448*448,
The fisrt feature image of 2048*14*14 dimension is obtained by Resnet network.Then 2048*14*14 is then used as above-mentioned first
Dimension.After through default pond function (such as Max-average pooling in Fig. 2) processing, 2048 dimensions are obtained
Feature vector, 2048 dimensions are as the second dimension at this time, it is assumed that the class label number of default classification is C, and C is less than 2048.Then
The input layer that the first full articulamentum is respectively configured by the class label number and the second dimension of default classification is corresponding with output layer
Parameter, therefore the first full articulamentum is 2048 × C.Finally the feature vector of 2048 dimensions is connected entirely by the first of 2048 × C
The first labeling prediction result is calculated in layer, which is the set of C vector, each vector
Respectively indicate prediction result corresponding to each classification.
Assuming that C is 3, then the first labeling prediction result includes the set of 3 vectors, if collection is combined into A { a1, a2, a3 },
Then a1 indicates the prediction result of first classification, and a2 indicates the prediction result of second classification, and a3 indicates the pre- of third classification
Survey result.
Optionally, the first labeling prediction result is calculated by sigmoid function, obtains classification value, when point
Class value shows that corresponding label is classified greater than 0.5 and is positive, wherein it is positive to indicate to contain this label in prediction result, less than 0.5
Show that corresponding label is classified to be negative, bear indicate prediction result in do not contain this label, thus indicated by positive and negative be
It is no containing label, without considering the problems of image tag number.
Step S103 carries out feature extraction to the fisrt feature image, generates second feature image.
As a kind of possible implementation, second feature image is further to be mentioned using the convolutional layer of convolutional neural networks
The characteristic image for taking fisrt feature image to obtain.
Wherein, the characteristic dimension (calling third dimension in the following text) of second feature image is less than the characteristic dimension of fisrt feature image.
Certainly, in practice, feature extraction can also be carried out to fisrt feature image otherwise, and then obtain
Second feature image is obtained, such as feature extraction is carried out to fisrt feature image by modes such as VGG or Inception.
Optionally, feature extraction is carried out to fisrt feature image whether through which kind of mode, step S103 includes: to be based on
The class label number of default classification carries out feature extraction to fisrt feature image, generates the second feature image of third dimension, the
Three dimensionality is equal to the product of class label number and preset constant, and the third dimension is less than the first of the fisrt feature image
Dimension, wherein the preset constant is the length and wide product of the image of output.
Continue for example above-mentioned, the first spy of 2048*14*14 dimension is obtained by Resnet network in Fig. 2
After levying image, network characterization is further extracted further according to the class label number C of default classification, obtains the second of C*14*14 dimension
Characteristic image, wherein C*14*14 dimension is above-mentioned third dimension, and 14*14 is preset constant.
Step S104 carries out the second dimension-reduction treatment to the second feature image and the second classification processing generates the second label
Classification prediction result, the second labeling prediction result are used to indicate the second classification results of each class label.
As a kind of possible implementation, step S104 includes: to carry out pond processing to second feature image, to obtain
With the feature vector of class label number identical dimensional;It is complete by the way that second will be inputted with the feature vector of class label number identical dimensional
Articulamentum carries out classification processing, generates the second labeling prediction result, and detailed process can be with are as follows:
Pondization processing is done to the second characteristics of image by maximum pond function, to obtain and class label number identical dimensional
Feature vector.Then by the full articulamentum calculated result of feature vector second with class label number identical dimensional, which is tied
Fruit is as the second labeling prediction result.
Continue for example above-mentioned, after the second feature image that C*14*14 dimension is obtained in Fig. 2.Pass through maximum
Pond function (such as Maxpooling in Fig. 2) dimensionality reduction obtains the feature vector of C dimension, and the C feature vector tieed up then is passed through C
The second labeling prediction result is calculated in the full articulamentum of the second of × C, which is C vector
Set, each vector respectively indicates prediction result corresponding to each classification.
Optionally, in order to guarantee that the second labeling prediction result can correctly be updated, to the second labeling
Prediction result calculates the second penalty values using sigmoid two-value cross entropy loss function, wherein passes through the size of the second penalty values
To measure Resnet network training degree quality used in this application.Wherein, the second penalty values meet:
Wherein, x indicate to y execute sigmoid calculate after it is obtained as a result, y indicate C vector, andWherein, it 0 indicates not containing this label in image, 1 indicates to contain this label in image.
In the present embodiment, by further using a new convolutional layer with bigger convolution kernel to fisrt feature figure
As carrying out feature extraction, to learn ' attention ' figure that image corresponds to target from fisrt feature image, to the note learnt
Meaning is tried hard to using maximum pond dimensionality reduction, learns final branch prediction results finally by the second full articulamentum, and pass through two
System cross entropy loss function may learn stronger image attention power feature, can greatly help point for solving multi-tag
Class effectively improves the precision to multi-tag image classification without noticing of in image the problem of multiple and different targets.
Step S105 determines institute according to the first labeling prediction result and the second labeling prediction result
State the target prediction result of image to be processed.
Wherein, target prediction is the result is that by first determining that the first labeling prediction result and second labeling are pre-
First labeling prediction result, i.e., be added with the second labeling prediction result and take average institute by the average value for surveying result
Obtained average value, then using average value as target prediction result.
In an optional embodiment, it is also based on the accuracy rate that preset rules determine target prediction result, it is specific
Process can be with are as follows:
The corresponding penalty values of target prediction result are first determined based on sigmoid function and cross entropy loss function, such as basis
Sigmoid function calculates corresponding first classification value of target prediction result;The first classification value is calculated according to cross entropy loss function
Corresponding penalty values.Finally accuracy rate is determined according to penalty values.Such as penalty values are smaller, then accuracy rate is higher, then it represents that target
Prediction result it is more accurate, i.e., it is higher to the precision of multi-tag image classification.
In embodiments of the present invention, by using sigmoid function compared to the prior art in softMax for.Its
In, softMax needs the prediction result last to network to do normalization operation, for single labeling task, it is only necessary to select
Label corresponding to maximum value is as final prediction result after normalization, and for multi-tag classification, each image
The number of labels for being included be it is unknown, the label of accurate image can not be accurately predicted using softMax, in addition,
The normalized result that will cause between different labels that operates of softMax influences each other, and influences the reversed biography of corresponding label loss
It broadcasts.However the first labeling prediction result and the second labeling prediction result are calculated by sigmoid function, it obtains
To classification value, it is positive when classification value shows that corresponding label is classified greater than 0.5, shows that corresponding label is classified less than 0.5
It is negative, to indicate whether without considering the problems of image tag number, and then can have by positive and negative containing label
Effect avoids the problem that the result between different labels influences each other, so that obtained target prediction result is more accurate, it is right
The precision of multi-tag classification is higher.
Multi-tag image classification method provided by the embodiment of the present invention, by the fisrt feature figure for extracting image to be processed
Picture carries out the first dimension-reduction treatment to fisrt feature image and the first classification processing generates the first labeling prediction result, and
Feature extraction is carried out to fisrt feature image, generates second feature image;To second feature image carry out the second dimension-reduction treatment with
Second classification processing generates the second labeling prediction result;It is pre- according to the first labeling prediction result and the second labeling
Survey the target prediction result that result determines image to be processed.In other words, the multi-tag image classification method in the embodiment of the present invention
On the one hand it will be classified to obtain the first labeling prediction result by fisrt feature image as target prediction result
A part will further be extracted second feature image from fisrt feature image, and be classified based on second feature image, be obtained
Another part to the second labeling prediction result as target prediction result, to pass through two parallel classification processings
Branch obtains two classification results, and then comprehensively considers two classification results and obtain target classification result;On the other hand, because from
Fisrt feature image further extracts second feature image, and is classified based on second feature image, so by further
Characteristics of image is extracted to solve the problems, such as the classification of multi-tag without noticing of targets multiple and different in image, by two above side
Face improves the precision of multi-tag image classification.
In order to more intuitively embody the beneficial effect of the multi-tag image classification method in the embodiment of the present invention, spy sends out this
In the industry cycle multi-tag nicety of grading is real on current large-scale image AUTHORITATIVE DATA collection MS-COCO for classification method in bright embodiment
Result is tested to compare with existing method, as shown in Table 1:
Table one
Wherein, for the validity of better balancing method, seven indexs are provided in table one as measurement standard, respectively
It is OP (overall precision), OR (overall recall), OF (overall F1), CP (pre-class
Precision), OR (pre-class recall), OF (pre-class F1), MAP (mean average
precision).Wherein, the indices in table one are the bigger the better, and the calculation formula about the indices in table one is such as
Shown in lower.
Wherein, C indicates the classification number to be predicted, and i indicates index,Indicate the quantity of the i-th classification of prediction pair,
Indicate the quantity of the i-th classification of prediction,Indicate the quantity of the i-th all classifications.
Wherein, WARP comes from paper " deepconvolution ranking for multilabel image
annotation";CNN-RNN (Convolutional Neural Network-Recurrent Neural Networks,
Convolutional neural networks-Recognition with Recurrent Neural Network) come from paper " aunified framework for multi-label image
classification";RLSD comes from paper " multi-label image classification with regional
latent semantic dependencies";RDAR comes from paper " multi-label image recognition by
recurrently discovering attentional regions";Resnet101 and resent107 be all using
The result of resent network;Resnet101-semantic and ResNet-SRN-att, ResNet-SRN are papers
" learning spatial regularization with image-level supervisions for multi-
Label image classification " proposed in three methods.
Wherein, OF and CF is more important index, and MAP is mostly important index.It can be intuitive to pass through table one
Find out that the numerical value of obtained OF, CF and MAP index of multi-tag image classification method provided by the embodiment of the present invention is opposite in ground
It is maximum value for the obtained result of the method for the prior art, so relative to existing technologies, implementing through the invention
Multi-tag image classification method in example can effectively improve the precision of multi-tag image classification.
Second embodiment
Corresponding to the multi-tag image classification method in first embodiment, Fig. 3 is shown using shown in first embodiment
The one-to-one multi-tag image classification device of multi-tag image classification method.As shown in figure 3, the multi-tag image classification dress
Setting 400 includes the first extraction module 410, first processing module 420, the second extraction module 430, Second processing module 440 and the
Three processing modules 450.Wherein, the first extraction module 410, first processing module 420, the second extraction module 430, second processing mould
The realization function of block 440 and third processing module 450 is gathered with step corresponding in first embodiment to be corresponded, to avoid repeating,
The present embodiment is not described in detail one by one.
First extraction module 410, for extracting the fisrt feature image of image to be processed.
First processing module 420, for carrying out the first dimension-reduction treatment and the generation of the first classification processing to fisrt feature image
First labeling prediction result, the first labeling prediction result are used to indicate the first classification results of each class label.
Optionally, the dimension of fisrt feature image is the first dimension, and first processing module 420 is also used to to the first spy
It levies image and carries out pond processing, to obtain the feature vector of the second dimension, the second dimension is less than the first dimension;By the way that second is tieed up
The full articulamentum of feature vector input first of degree carries out classification processing, generates the first labeling prediction result.
Wherein, pond processing is carried out to the fisrt feature image, to obtain the feature vector of the second dimension, comprising: root
Default pond function is determined according to maximum pond function and average pond function;Fisrt feature image is done by default pond function
Pondization processing, obtains the feature vector of the second dimension.
Second extraction module 430 generates second feature image for carrying out feature extraction to fisrt feature image.
Optionally, second extraction module 430 is also used to the class label number based on default classification to fisrt feature figure
As carrying out feature extraction, the second feature image of third dimension is generated, third dimension is equal to class label number and preset constant
Product, and the third dimension is less than the first dimension of the fisrt feature image.
Second processing module 440, for carrying out the second dimension-reduction treatment and the generation of the second classification processing to second feature image
Second labeling prediction result, the second labeling prediction result are used to indicate the second classification results of each class label.
Optionally, Second processing module 440 are also used to carry out pond processing to second feature image, to obtain and classification
The feature vector of number of tags identical dimensional;By the way that the second full articulamentum will be inputted with the feature vector of class label number identical dimensional
Classification processing is carried out, the second labeling prediction result is generated.
Third processing module 450, for true according to the first labeling prediction result and the second labeling prediction result
The target prediction result of fixed image to be processed.
Optionally, third processing module 450 is also used to determine that the first labeling prediction result and the second labeling are pre-
The average value of result is surveyed as target prediction result.
Further, multi-tag image classification device further includes accuracy rate computing module.Accuracy rate computing module is used for base
The accuracy rate of target prediction result is determined in preset rules.
Optionally, accuracy rate computing module can be also used for determining mesh based on sigmoid function and cross entropy loss function
Mark the corresponding penalty values of prediction result;Accuracy rate is determined according to penalty values.
Wherein, the corresponding penalty values of target prediction result, packet are determined based on sigmoid function and cross entropy loss function
It includes: corresponding first classification value of target prediction result is calculated according to sigmoid function;First is calculated according to cross entropy loss function
The corresponding penalty values of classification value.
Further, multi-tag image classification device further includes fourth processing module.Fourth processing module is used for based on friendship
Fork entropy loss function determines the second penalty values corresponding to the second labeling prediction result.
3rd embodiment
As shown in figure 4, being the schematic diagram of terminal device 300.The terminal device 300 includes memory 302, processor
304 and it is stored in the computer program 303 that can be run in the memory 302 and on the processor 304, the calculating
The multi-tag image classification method in first embodiment is realized when machine program 303 is executed by processor 304, to avoid weight
Multiple, details are not described herein again.Alternatively, realizing more marks described in second embodiment when the computer program 303 is executed by processor 304
The function of each model/unit in image classification device is signed, to avoid repeating, details are not described herein again.
Illustratively, computer program 303 can be divided into one or more module/units, one or more mould
Block/unit is stored in memory 302, and is executed by processor 304, to complete the present invention.One or more module/units
It can be the series of computation machine program instruction section that can complete specific function, the instruction segment is for describing computer program 303
Implementation procedure in terminal device 300.For example, first the mentioning of being divided into second embodiment of computer program 303
Modulus block 410, first processing module 420, the second extraction module 430, Second processing module 440 and third processing module 450, respectively
The concrete function of module will not repeat them here as described in the first embodiment or the second embodiment.
Terminal device 300 can be desktop PC, notebook, palm PC and cloud server etc. and calculate equipment.
Wherein, memory 302 may be, but not limited to, random access memory (Random Access Memory,
RAM), read-only memory (Read Only Memory, ROM), programmable read only memory (Programmable Read-
Only Memory, PROM), erasable read-only memory (Erasable Programmable Read-Only Memory,
EPROM), electricallyerasable ROM (EEROM) (Electric Erasable Programmable Read-Only Memory,
EEPROM) etc..Wherein, memory 302 is for storing program, and the processor 304 is after receiving and executing instruction, described in execution
The method of program, the flow definition that aforementioned any embodiment of the embodiment of the present invention discloses can be applied in processor 304, or
It is realized by processor 304.
Processor 304 may be a kind of IC chip, the processing capacity with signal.Above-mentioned processor 304 can
To be general processor, including central processing unit (Central Processing Unit, CPU), network processing unit (Network
Processor, NP) etc.;It can also be digital signal processor (Digital Signal Processor, DSP), dedicated integrated
Circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field-
Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic,
Discrete hardware components.It may be implemented or execute disclosed each method, step and the logic diagram in the embodiment of the present invention.It is general
Processor can be microprocessor or the processor is also possible to any conventional processor etc..
It is understood that structure shown in Fig. 4 is only a kind of structural schematic diagram of terminal device 300, terminal device 300
It can also include than more or fewer components shown in Fig. 4.Each component shown in Fig. 4 can use hardware, software or its group
It closes and realizes.
Fourth embodiment
The embodiment of the present invention also provides a kind of storage medium, and instruction is stored on the storage medium, when described instruction exists
The multi-tag image point in first embodiment is realized when running on computer, when the computer program is executed by processor
Class method, to avoid repeating, details are not described herein again.Alternatively, realizing that second implements when the computer program is executed by processor
The function of each model/unit in the example multi-tag image classification device, to avoid repeating, details are not described herein again.
Through the above description of the embodiments, those skilled in the art can be understood that the present invention can lead to
Hardware realization is crossed, the mode of necessary general hardware platform can also be added to realize by software, based on this understanding, this hair
Bright technical solution can be embodied in the form of software products, which can store in a non-volatile memories
In medium (can be CD-ROM, USB flash disk, mobile hard disk etc.), including some instructions are used so that computer equipment (can be with
It is personal computer, server or the network equipment etc.) method that executes each implement scene of the present invention.
The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field
For art personnel, the invention may be variously modified and varied.All within the spirits and principles of the present invention, made any to repair
Change, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.It should also be noted that similar label and letter exist
Similar terms are indicated in following attached drawing, therefore, once being defined in a certain Xiang Yi attached drawing, are then not required in subsequent attached drawing
It is further defined and explained.
Claims (12)
1. a kind of multi-tag image classification method characterized by comprising
Extract the fisrt feature image of image to be processed;
First dimension-reduction treatment is carried out to the fisrt feature image and the first classification processing generates the first labeling prediction result,
The first labeling prediction result is used to indicate the first classification results of each class label;
Feature extraction is carried out to the fisrt feature image, generates second feature image;
Second dimension-reduction treatment is carried out to the second feature image and the second classification processing generates the second labeling prediction result,
The second labeling prediction result is used to indicate the second classification results of each class label;
The image to be processed is determined according to the first labeling prediction result and the second labeling prediction result
Target prediction result.
2. the method according to claim 1, wherein the characteristic dimension of the fisrt feature image is the first dimension
Degree, it is described that first dimension-reduction treatment and generation the first labeling prediction of the first classification processing are carried out to the fisrt feature image
As a result, comprising:
Pond processing is carried out to the fisrt feature image, to obtain the feature vector of the second dimension, second dimension is less than
First dimension;
By the way that the full articulamentum of described eigenvector input first of second dimension is carried out classification processing, described first is generated
Labeling prediction result.
3. according to the method described in claim 2, it is characterized in that, described carry out pond Hua Chu to the fisrt feature image
Reason, to obtain the feature vector of the second dimension, comprising:
Default pond function is determined according to maximum pond function and average pond function;
Pondization processing is done to the fisrt feature image by the default pond function, to obtain described in second dimension
Feature vector.
4. the method according to claim 1, wherein described propose fisrt feature image progress feature
It takes, generates second feature image, comprising:
Feature extraction is carried out to the fisrt feature image based on the class label number of default classification, generates the described of third dimension
Second feature image, the third dimension is equal to the product of the class label number and preset constant, and the third dimension is small
In the first dimension of the fisrt feature image.
5. according to the method described in claim 4, it is characterized in that, described carry out the second dimensionality reduction to the second feature image
Processing generates the second labeling prediction result with the second classification processing, comprising:
Pond processing is carried out to the second feature image, to obtain the feature vector with the class label number identical dimensional;
By the way that classification processing will be carried out with the full articulamentum of the described eigenvector of class label number identical dimensional input second,
Generate the second labeling prediction result.
6. the method according to claim 1, wherein it is described according to the first labeling prediction result with
The second labeling prediction result determines target prediction result, comprising:
Determine the average value of the first labeling prediction result and the second labeling prediction result as the mesh
Mark prediction result.
7. the method according to claim 1, wherein the method also includes:
The accuracy rate of the target prediction result is determined based on preset rules.
8. the method according to the description of claim 7 is characterized in that described determine the target prediction knot based on preset rules
The accuracy rate of fruit, comprising:
The corresponding penalty values of the target prediction result are determined based on sigmoid function and cross entropy loss function;
The accuracy rate is determined according to the penalty values.
9. according to the method described in claim 8, it is characterized in that, it is described based on sigmoid function with intersect entropy loss letter
Number determines the corresponding penalty values of the target prediction result, comprising:
Corresponding first classification value of the target prediction result is calculated according to the sigmoid function;
The corresponding penalty values of first classification value are calculated according to the cross entropy loss function.
10. a kind of multi-tag image classification device characterized by comprising
First extraction module, for extracting the fisrt feature image of image to be processed;
First processing module, for carrying out the first dimension-reduction treatment and the first classification processing generation first to the fisrt feature image
Labeling prediction result, the first labeling prediction result are used to indicate the first classification results of each class label;
Second extraction module generates second feature image for carrying out feature extraction to the fisrt feature image;
Second processing module, for carrying out the second dimension-reduction treatment and the second classification processing generation second to the second feature image
Labeling prediction result, the second labeling prediction result are used to indicate the second classification knot of each class label
Fruit;
Third processing module, for true according to the first labeling prediction result and the second labeling prediction result
The target prediction result of the fixed image to be processed.
11. a kind of terminal device characterized by comprising memory, processor and storage are in the memory and can be
The computer program run on the processor, the processor realize such as claim 1 to 9 when executing the computer program
The step of described in any item multi-tag image classification methods.
12. a kind of storage medium, which is characterized in that instruction is stored on the storage medium, when described instruction on computers
When operation, so that the computer executes multi-tag image classification method as described in any one of claim 1 to 9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810735861.2A CN109165666A (en) | 2018-07-05 | 2018-07-05 | Multi-tag image classification method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810735861.2A CN109165666A (en) | 2018-07-05 | 2018-07-05 | Multi-tag image classification method, device, equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109165666A true CN109165666A (en) | 2019-01-08 |
Family
ID=64897423
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810735861.2A Pending CN109165666A (en) | 2018-07-05 | 2018-07-05 | Multi-tag image classification method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109165666A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109886346A (en) * | 2019-02-26 | 2019-06-14 | 四川大学华西医院 | A kind of cardiac muscle MRI image categorizing system |
CN110738258A (en) * | 2019-10-16 | 2020-01-31 | Oppo广东移动通信有限公司 | Image classification method and device and terminal equipment |
CN111797876A (en) * | 2019-04-09 | 2020-10-20 | Oppo广东移动通信有限公司 | Data classification method and device, storage medium and electronic equipment |
WO2020224403A1 (en) * | 2019-05-07 | 2020-11-12 | 腾讯科技(深圳)有限公司 | Classification task model training method, apparatus and device and storage medium |
WO2021179483A1 (en) * | 2020-03-09 | 2021-09-16 | 平安科技(深圳)有限公司 | Intention identification method, apparatus and device based on loss function, and storage medium |
CN116594627A (en) * | 2023-05-18 | 2023-08-15 | 湖北大学 | Multi-label learning-based service matching method in group software development |
CN111797876B (en) * | 2019-04-09 | 2024-06-04 | Oppo广东移动通信有限公司 | Data classification method and device, storage medium and electronic equipment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106156807A (en) * | 2015-04-02 | 2016-11-23 | 华中科技大学 | The training method of convolutional neural networks model and device |
CN106373109A (en) * | 2016-08-31 | 2017-02-01 | 南方医科大学 | Medical image modal synthesis method |
CN107004142A (en) * | 2014-12-10 | 2017-08-01 | 北京市商汤科技开发有限公司 | method and system for image classification |
CN108171254A (en) * | 2017-11-22 | 2018-06-15 | 北京达佳互联信息技术有限公司 | Image tag determines method, apparatus and terminal |
CN108229296A (en) * | 2017-09-30 | 2018-06-29 | 深圳市商汤科技有限公司 | The recognition methods of face skin attribute and device, electronic equipment, storage medium |
-
2018
- 2018-07-05 CN CN201810735861.2A patent/CN109165666A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107004142A (en) * | 2014-12-10 | 2017-08-01 | 北京市商汤科技开发有限公司 | method and system for image classification |
CN106156807A (en) * | 2015-04-02 | 2016-11-23 | 华中科技大学 | The training method of convolutional neural networks model and device |
CN106373109A (en) * | 2016-08-31 | 2017-02-01 | 南方医科大学 | Medical image modal synthesis method |
CN108229296A (en) * | 2017-09-30 | 2018-06-29 | 深圳市商汤科技有限公司 | The recognition methods of face skin attribute and device, electronic equipment, storage medium |
CN108171254A (en) * | 2017-11-22 | 2018-06-15 | 北京达佳互联信息技术有限公司 | Image tag determines method, apparatus and terminal |
Non-Patent Citations (3)
Title |
---|
BOLEI ZHOU等: "Learning Deep Features for Discriminative Localization", 《2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION(CVPR)》 * |
FENG ZHU等: "Learning Spatial Regularization with Image-level Supervisions for Multi-label Image Classification", 《2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 * |
熊有伦等: "《机器人学 建模、控制与视觉》", 31 March 2018 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109886346A (en) * | 2019-02-26 | 2019-06-14 | 四川大学华西医院 | A kind of cardiac muscle MRI image categorizing system |
CN111797876A (en) * | 2019-04-09 | 2020-10-20 | Oppo广东移动通信有限公司 | Data classification method and device, storage medium and electronic equipment |
CN111797876B (en) * | 2019-04-09 | 2024-06-04 | Oppo广东移动通信有限公司 | Data classification method and device, storage medium and electronic equipment |
WO2020224403A1 (en) * | 2019-05-07 | 2020-11-12 | 腾讯科技(深圳)有限公司 | Classification task model training method, apparatus and device and storage medium |
CN110738258A (en) * | 2019-10-16 | 2020-01-31 | Oppo广东移动通信有限公司 | Image classification method and device and terminal equipment |
CN110738258B (en) * | 2019-10-16 | 2022-04-29 | Oppo广东移动通信有限公司 | Image classification method and device and terminal equipment |
WO2021179483A1 (en) * | 2020-03-09 | 2021-09-16 | 平安科技(深圳)有限公司 | Intention identification method, apparatus and device based on loss function, and storage medium |
CN116594627A (en) * | 2023-05-18 | 2023-08-15 | 湖北大学 | Multi-label learning-based service matching method in group software development |
CN116594627B (en) * | 2023-05-18 | 2023-12-12 | 湖北大学 | Multi-label learning-based service matching method in group software development |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109165666A (en) | Multi-tag image classification method, device, equipment and storage medium | |
Katharopoulos et al. | Processing megapixel images with deep attention-sampling models | |
CN109086811A (en) | Multi-tag image classification method, device and electronic equipment | |
CN110210560A (en) | Increment training method, classification method and the device of sorter network, equipment and medium | |
CN109523520A (en) | A kind of chromosome automatic counting method based on deep learning | |
CN105354595A (en) | Robust visual image classification method and system | |
CN111274981B (en) | Target detection network construction method and device and target detection method | |
CN107808126A (en) | Vehicle retrieval method and device | |
CN112819110B (en) | Incremental small sample target detection method and system based on weight generation | |
CN110222215A (en) | A kind of crop pest detection method based on F-SSD-IV3 | |
CN105095494A (en) | Method for testing categorical data set | |
CN110188763A (en) | A kind of image significance detection method based on improvement graph model | |
CN108629373A (en) | A kind of image classification method, system, equipment and computer readable storage medium | |
CN103971136A (en) | Large-scale data-oriented parallel structured support vector machine classification method | |
CN113129335A (en) | Visual tracking algorithm and multi-template updating strategy based on twin network | |
CN111582315B (en) | Sample data processing method and device and electronic equipment | |
CN105989043A (en) | Method and device for automatically acquiring trademark in commodity image and searching trademark | |
CN108537270A (en) | Image labeling method, terminal device and storage medium based on multi-tag study | |
Shankar et al. | A framework to enhance object detection performance by using YOLO algorithm | |
Ouf | Leguminous seeds detection based on convolutional neural networks: Comparison of faster R-CNN and YOLOv4 on a small custom dataset | |
CN113496260A (en) | Grain depot worker non-standard operation detection method based on improved YOLOv3 algorithm | |
CN106557526A (en) | The apparatus and method for processing image | |
CN110428012A (en) | Brain method for establishing network model, brain image classification method, device and electronic equipment | |
CN116303677A (en) | Measurement method, device, equipment and storage medium based on data distribution balance degree | |
CN109583492A (en) | A kind of method and terminal identifying antagonism image |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190108 |
|
RJ01 | Rejection of invention patent application after publication |