CN109165666A

CN109165666A - Multi-tag image classification method, device, equipment and storage medium

Info

Publication number: CN109165666A
Application number: CN201810735861.2A
Authority: CN
Inventors: 魏秀参; 刘威威
Original assignee: Xuzhou Kuang Shi Data Technology Co Ltd; Nanjing Kuanyun Technology Co Ltd; Beijing Megvii Technology Co Ltd
Current assignee: Xuzhou Kuang Shi Data Technology Co Ltd; Nanjing Kuanyun Technology Co Ltd; Beijing Megvii Technology Co Ltd
Priority date: 2018-07-05
Filing date: 2018-07-05
Publication date: 2019-01-08

Abstract

Multi-tag image classification method, device, equipment and storage medium provided by the invention, belong to technical field of image processing.The multi-tag image classification method includes: the fisrt feature image for extracting image to be processed；First dimension-reduction treatment is carried out to fisrt feature image and the first classification processing generates the first labeling prediction result；Feature extraction is carried out to fisrt feature image, generates second feature image；Second dimension-reduction treatment is carried out to second feature image and the second classification processing generates the second labeling prediction result, the second labeling prediction result is used to indicate the second classification results of each class label；The target prediction of image to be processed is determined according to the first labeling prediction result and the second labeling prediction result as a result, to improve the precision of multi-tag image classification.

Description

Multi-tag image classification method, device, equipment and storage medium

Technical field

The present invention relates to field of image processings, in particular to multi-tag image classification method, device, equipment and deposit Storage media.

Background technique

Multi-tag image classification (multi-label classification) is an important research in computer vision Project, the development of arriving and depth learning technology in particular with big data era, image classification obtain more and more close Note.However common image classification only needs to each image classification to be a label, and multi-tag classification is then needed to each The target for including in image is correctly classified, and the size of the target of different labels in the picture is not quite similar, each image tag number Amount is also not fixed, and brings great difficulty to multi-tag classification.And traditional problem is mostly used to convert in existing research at present (algorithm adaptation) method is transformed to solve multi-tag image in (problem transformation) and algorithm Classification problem, but the unsuitable processing data diversity of these conventional sorting methods is high, the multi-tag image point more than classification number Class problem can not accurately realize that multi-tag is classified.

Summary of the invention

Multi-tag image classification method, device, equipment and storage medium provided in an embodiment of the present invention, can solve existing The technical issues of precision that can not improve multi-tag image classification in technology.

To achieve the goals above, technical solution used in the embodiment of the present invention is as follows:

In a first aspect, a kind of multi-tag image classification method provided in an embodiment of the present invention, comprising: extract image to be processed Fisrt feature image；First dimension-reduction treatment is carried out to the fisrt feature image and the first classification processing generates the first label point Class prediction result, the first labeling prediction result are used to indicate the first classification results of each class label；To described Fisrt feature image carries out feature extraction, generates second feature image；Second dimension-reduction treatment is carried out to the second feature image The second labeling prediction result is generated with the second classification processing, the second labeling prediction result is for indicating described every Second classification results of a class label；It is tied according to the first labeling prediction result and second labeling prediction Fruit determines the target prediction result of the image to be processed.

With reference to first aspect, the embodiment of the invention provides the first possible embodiment of first aspect, described The dimension of one characteristic image is the first dimension, described to carry out the first dimension-reduction treatment and the first classification to the fisrt feature image Processing generates the first labeling prediction result, comprising: pond processing is carried out to the fisrt feature image, to obtain the second dimension The feature vector of degree, second dimension are less than first dimension；By the way that the described eigenvector of second dimension is defeated Enter the first full articulamentum and carry out classification processing, generates the first labeling prediction result.

The possible embodiment of with reference to first aspect the first, the embodiment of the invention provides second of first aspect Possible embodiment, it is described that pond processing is carried out to the fisrt feature image, to obtain the feature vector of the second dimension, It include: that default pond function is determined according to maximum pond function and average pond function；By the default pond function to institute It states fisrt feature image and does pondization processing, to obtain the described eigenvector of second dimension.

With reference to first aspect, described the embodiment of the invention provides the third possible embodiment of first aspect Feature extraction is carried out to the fisrt feature image, generates second feature image, comprising: the class label number based on default classification Feature extraction is carried out to the fisrt feature image, generates the second feature image of third dimension, described third dimension etc. In the product of the class label number and preset constant, and the third dimension is less than the first dimension of the fisrt feature image Degree.

The third possible embodiment with reference to first aspect, the embodiment of the invention provides the 4th kind of first aspect Possible embodiment, it is described that second dimension-reduction treatment and the second classification processing generation second are carried out to the second feature image Labeling prediction result, comprising: pond processing is carried out to the second feature image, to obtain and the class label number phase With the feature vector of dimension；By will be inputted with the described eigenvector of the class label number identical dimensional

Second full articulamentum carries out classification processing, generates the second labeling prediction result.

With reference to first aspect, described the embodiment of the invention provides the 5th kind of possible embodiment of first aspect Target prediction result is determined according to the first labeling prediction result and the second labeling prediction result, comprising: Determine that the average value of the first labeling prediction result and the second labeling prediction result is pre- as the target Survey result.

With reference to first aspect, the embodiment of the invention provides the 6th kind of possible embodiment of first aspect, the sides Method further include: the accuracy rate of the target prediction result is determined based on preset rules.

The 6th kind of possible embodiment with reference to first aspect, the embodiment of the invention provides the 7th kind of first aspect Possible embodiment, the accuracy rate that the target prediction result is determined based on preset rules, comprising: be based on Sigmoid function and cross entropy loss function determine the corresponding penalty values of the target prediction result；It is true according to the penalty values The fixed accuracy rate.

The 7th kind of possible embodiment with reference to first aspect, the embodiment of the invention provides the 8th of first aspect the The possible embodiment of kind, it is described that the target prediction result pair is determined based on sigmoid function and cross entropy loss function The penalty values answered, comprising: corresponding first classification value of the target prediction result is calculated according to the sigmoid function；According to The cross entropy loss function calculates the corresponding penalty values of first classification value.

Second aspect, a kind of multi-tag image classification device provided in an embodiment of the present invention, comprising: the first extraction module, For extracting the fisrt feature image of image to be processed；First processing module, for carrying out first to the fisrt feature image Dimension-reduction treatment and the first classification processing generate the first labeling prediction result, and the first labeling prediction result is used for table Show the first classification results of each class label；Second extraction module, for carrying out feature extraction to the fisrt feature image, Generate second feature image；Second processing module, for carrying out the second dimension-reduction treatment and second point to the second feature image Class processing generates the second labeling prediction result, and the second labeling prediction result is for indicating each classification mark Second classification results of label；Third processing module, for according to the first labeling prediction result and second label Classification prediction result determines the target prediction result of the image to be processed.

The third aspect, a kind of terminal device provided in an embodiment of the present invention, comprising: memory, processor and be stored in In the memory and the computer program that can run on the processor, when the processor executes the computer program It realizes as described in any one of first aspect the step of multi-tag image classification method.

Fourth aspect, a kind of storage medium provided in an embodiment of the present invention are stored with instruction on the storage medium, work as institute Instruction is stated when running on computers, so that the computer executes such as the described in any item multi-tag image classifications of first aspect Method.

Compared with prior art, the embodiment of the present invention bring it is following the utility model has the advantages that

Multi-tag image classification method, device, equipment and storage medium provided in an embodiment of the present invention, by extracting wait locate The fisrt feature image for managing image carries out the first dimension-reduction treatment to fisrt feature image and the first classification processing generates the first label Classification prediction result, and feature extraction is carried out to fisrt feature image, generate second feature image；To second feature image into The second dimension-reduction treatment of row and the second classification processing generate the second labeling prediction result；According to the first labeling prediction result The target prediction result of image to be processed is determined with the second labeling prediction result.In other words, more in the embodiment of the present invention On the one hand label image classification method is that will be classified to obtain the first labeling prediction result work by fisrt feature image For a part of target prediction result, second feature image will be further extracted from fisrt feature image, and special based on second Sign image is classified, and another part of the second labeling prediction result as target prediction result is obtained, to pass through Two parallel classification processing branches obtain two classification results, and then comprehensively consider two classification results and obtain target classification knot Fruit；On the other hand, because further extracting second feature image from fisrt feature image, and divided based on second feature image Class, so by extracting characteristics of image further to solve the classification of multi-tag asking without noticing of targets multiple and different in image Topic, by improving the precision of multi-tag image classification in terms of two above.

Other feature and advantage of the disclosure will illustrate in the following description, alternatively, Partial Feature and advantage can be with Deduce from specification or unambiguously determine, or by implement the disclosure above-mentioned technology it can be learnt that.

To enable the above objects, features and advantages of the present invention to be clearer and more comprehensible, preferred embodiment is cited below particularly, and cooperate Appended attached drawing, is described in detail below.

Detailed description of the invention

In order to illustrate the technical solution of the embodiments of the present invention more clearly, below will be to needed in the embodiment attached Figure is briefly described, it should be understood that the following drawings illustrates only certain embodiments of the present invention, therefore is not construed as pair The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this A little attached drawings obtain other relevant attached drawings.

Fig. 1 is the flow chart for the multi-tag image classification method that first embodiment of the invention provides；

Fig. 2 is the network structure flow chart in multi-tag image classification method shown in FIG. 1；

Fig. 3 is the functional block diagram for the multi-tag image classification device that second embodiment of the invention provides；

Fig. 4 is a kind of schematic diagram for terminal device that third embodiment of the invention provides.

Specific embodiment

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without creative efforts, shall fall within the protection scope of the present invention.Therefore, The model of claimed invention is not intended to limit to the detailed description of the embodiment of the present invention provided in the accompanying drawings below It encloses, but is merely representative of selected embodiment of the invention.Based on the embodiments of the present invention, those of ordinary skill in the art are not having Every other embodiment obtained under the premise of creative work is made, shall fall within the protection scope of the present invention.

With reference to the accompanying drawing, it elaborates to some embodiments of the present invention.In the absence of conflict, following Feature in embodiment and embodiment can be combined with each other.

First embodiment

Since existing multi-tag image classification method is only applicable to image tag low to diversity and that classification number is less Classify, is precision to improve the classification of processing data high to diversity, more than classification number, the present embodiment provides firstly A kind of multi-tag image classification method, it should be noted that step shown in the flowchart of the accompanying drawings can be in such as one group of meter It is executed in the computer system of calculation machine executable instruction, although also, logical order is shown in flow charts, certain In the case of, it can be with the steps shown or described are performed in an order that is different from the one herein.It is situated between in detail to the present embodiment below It continues.

Referring to Fig. 1, being the flow chart of multi-tag image classification method provided in an embodiment of the present invention.It below will be to Fig. 1 Shown in detailed process be described in detail.

Step S101 extracts the fisrt feature image of image to be processed.

In embodiments of the present invention, image to be processed can be the image to be processed that user is uploaded with picture format, such as The formats such as bmp, jpg or png.It can also be the shooting picture of image collecting device (such as camera) capture.Either user is logical Cross the image to be processed for the picture format that network is downloaded.

Wherein, for ease of description, the characteristic dimension of fisrt feature image is known as the first dimension below.First dimension is for example It is positive integer for n, n.In general, the size of the image to be processed of the value and input of n, user need classify tag along sort number, The size of selected convolution kernel (in the case where step S101 is to realize feature extraction by convolutional layer) is related.

As shown in Fig. 2, as an implementation, fisrt feature image is by that can obtain after depth convolutional network Resnet To corresponding convolution feature.For example, image to be processed is the image of 448*448,2048*14* is obtained by Resnet network The fisrt feature image of 14 dimensions (the first dimension).

In practice, the feature of image to be processed can also be extracted otherwise, and then obtain fisrt feature Image for example, extracting the feature of image to be processed by the modes such as VGG network or Inception network, and then obtains the first spy Levy image.

Step S102 carries out the first dimension-reduction treatment to the fisrt feature image and the first classification processing generates the first label Classification prediction result, the first labeling prediction result are used to indicate the first classification results of each class label.

As a kind of possible implementation, step S102 includes: to carry out pond processing to fisrt feature image, to obtain The feature vector of second dimension, wherein the second dimension is less than the first dimension；By the way that the feature vector of the second dimension is inputted first Full articulamentum carries out classification processing, generates the first labeling prediction result.

Optionally, pond processing is carried out to fisrt feature image, to obtain the feature vector of the second dimension, comprising: according to Maximum pond function and average pond function determine default pond function, do pond to fisrt feature image by default pond function Change processing, to obtain the feature vector of the second dimension.Such as the first son is determined according to the first preset constant and maximum pond function Function；The second subfunction is determined according to the second preset constant and average pond function；According to first subfunction and described the Two subfunctions determine the default pond function.

Wherein, presetting pond function can indicate are as follows: F=α * F_M+β*F_A, wherein α and β indicates the parameter that can learn, and leads to Penalty values (loss) backpropagation for crossing final loss function updates, and meets the default pond function of alpha+beta=1, F expression, F_MTable Show the maximum pond function of expression, for taking maximum to characteristic point in neighborhood.F_AAverage pond function is indicated, for special in neighborhood Sign point is only averaging.α*F_MIndicate the first subfunction, β * F_AIndicate the second subfunction.

Wherein, the class label number for presetting classification can be that user is set according to actual needs, here, not making specific It limits.For example, user needs to obtain the class label of three classifications in image to be processed.As in image to be processed comprising people, The classifications such as doggie, kitten, the sun, but user is only it needs to be determined that these three classifications of doggie, kitten and the sun.

Certainly, the class label number for presetting classification is also possible to default, such as just default includes two classes of people and doggie Not；It is also possible to determine that all categories on image are the class label number of default classification, if such as on an image to be processed Four people, doggie, kitten, sun classifications are only included, then it is determined that all categories on image are behaved, doggie, kitten, the sun four A classification is the class label number of default classification.

It should be noted that the class label number of the default classification in the embodiment of the present invention can be 1,1 can also be greater than. When the class label number of default classification is greater than 1, the effect of the multi-tag image classification method in the embodiment of the present invention can be brighter It is aobvious.

In addition, both having included most by obtained feature vector after default pond function pondization processing in the present embodiment The information of great Chiization also includes the information in average pond, and so as to make full use of the advantage of the two, it is maximum to make up exclusive use The disadvantage in pond or average pond, and then can effectively solve average pondization loss information and maximum pondization and retain irrelevant information The problem of, so that obtained first labeling prediction result is more accurate.

Continue for the example in step S101, as illustrated in fig. 2, it is assumed that image to be processed is the image of 448*448, The fisrt feature image of 2048*14*14 dimension is obtained by Resnet network.Then 2048*14*14 is then used as above-mentioned first Dimension.After through default pond function (such as Max-average pooling in Fig. 2) processing, 2048 dimensions are obtained Feature vector, 2048 dimensions are as the second dimension at this time, it is assumed that the class label number of default classification is C, and C is less than 2048.Then The input layer that the first full articulamentum is respectively configured by the class label number and the second dimension of default classification is corresponding with output layer Parameter, therefore the first full articulamentum is 2048 × C.Finally the feature vector of 2048 dimensions is connected entirely by the first of 2048 × C The first labeling prediction result is calculated in layer, which is the set of C vector, each vector Respectively indicate prediction result corresponding to each classification.

Assuming that C is 3, then the first labeling prediction result includes the set of 3 vectors, if collection is combined into A { a1, a2, a3 }, Then a1 indicates the prediction result of first classification, and a2 indicates the prediction result of second classification, and a3 indicates the pre- of third classification Survey result.

Optionally, the first labeling prediction result is calculated by sigmoid function, obtains classification value, when point Class value shows that corresponding label is classified greater than 0.5 and is positive, wherein it is positive to indicate to contain this label in prediction result, less than 0.5 Show that corresponding label is classified to be negative, bear indicate prediction result in do not contain this label, thus indicated by positive and negative be It is no containing label, without considering the problems of image tag number.

Step S103 carries out feature extraction to the fisrt feature image, generates second feature image.

As a kind of possible implementation, second feature image is further to be mentioned using the convolutional layer of convolutional neural networks The characteristic image for taking fisrt feature image to obtain.

Wherein, the characteristic dimension (calling third dimension in the following text) of second feature image is less than the characteristic dimension of fisrt feature image.

Certainly, in practice, feature extraction can also be carried out to fisrt feature image otherwise, and then obtain Second feature image is obtained, such as feature extraction is carried out to fisrt feature image by modes such as VGG or Inception.

Optionally, feature extraction is carried out to fisrt feature image whether through which kind of mode, step S103 includes: to be based on The class label number of default classification carries out feature extraction to fisrt feature image, generates the second feature image of third dimension, the Three dimensionality is equal to the product of class label number and preset constant, and the third dimension is less than the first of the fisrt feature image Dimension, wherein the preset constant is the length and wide product of the image of output.

Continue for example above-mentioned, the first spy of 2048*14*14 dimension is obtained by Resnet network in Fig. 2 After levying image, network characterization is further extracted further according to the class label number C of default classification, obtains the second of C*14*14 dimension Characteristic image, wherein C*14*14 dimension is above-mentioned third dimension, and 14*14 is preset constant.

Step S104 carries out the second dimension-reduction treatment to the second feature image and the second classification processing generates the second label Classification prediction result, the second labeling prediction result are used to indicate the second classification results of each class label.

As a kind of possible implementation, step S104 includes: to carry out pond processing to second feature image, to obtain With the feature vector of class label number identical dimensional；It is complete by the way that second will be inputted with the feature vector of class label number identical dimensional Articulamentum carries out classification processing, generates the second labeling prediction result, and detailed process can be with are as follows:

Pondization processing is done to the second characteristics of image by maximum pond function, to obtain and class label number identical dimensional Feature vector.Then by the full articulamentum calculated result of feature vector second with class label number identical dimensional, which is tied Fruit is as the second labeling prediction result.

Continue for example above-mentioned, after the second feature image that C*14*14 dimension is obtained in Fig. 2.Pass through maximum Pond function (such as Maxpooling in Fig. 2) dimensionality reduction obtains the feature vector of C dimension, and the C feature vector tieed up then is passed through C The second labeling prediction result is calculated in the full articulamentum of the second of × C, which is C vector Set, each vector respectively indicates prediction result corresponding to each classification.

Optionally, in order to guarantee that the second labeling prediction result can correctly be updated, to the second labeling Prediction result calculates the second penalty values using sigmoid two-value cross entropy loss function, wherein passes through the size of the second penalty values To measure Resnet network training degree quality used in this application.Wherein, the second penalty values meet:

Wherein, x indicate to y execute sigmoid calculate after it is obtained as a result, y indicate C vector, andWherein, it 0 indicates not containing this label in image, 1 indicates to contain this label in image.

In the present embodiment, by further using a new convolutional layer with bigger convolution kernel to fisrt feature figure As carrying out feature extraction, to learn ' attention ' figure that image corresponds to target from fisrt feature image, to the note learnt Meaning is tried hard to using maximum pond dimensionality reduction, learns final branch prediction results finally by the second full articulamentum, and pass through two System cross entropy loss function may learn stronger image attention power feature, can greatly help point for solving multi-tag Class effectively improves the precision to multi-tag image classification without noticing of in image the problem of multiple and different targets.

Step S105 determines institute according to the first labeling prediction result and the second labeling prediction result State the target prediction result of image to be processed.

Wherein, target prediction is the result is that by first determining that the first labeling prediction result and second labeling are pre- First labeling prediction result, i.e., be added with the second labeling prediction result and take average institute by the average value for surveying result Obtained average value, then using average value as target prediction result.

In an optional embodiment, it is also based on the accuracy rate that preset rules determine target prediction result, it is specific Process can be with are as follows:

The corresponding penalty values of target prediction result are first determined based on sigmoid function and cross entropy loss function, such as basis Sigmoid function calculates corresponding first classification value of target prediction result；The first classification value is calculated according to cross entropy loss function Corresponding penalty values.Finally accuracy rate is determined according to penalty values.Such as penalty values are smaller, then accuracy rate is higher, then it represents that target Prediction result it is more accurate, i.e., it is higher to the precision of multi-tag image classification.

In embodiments of the present invention, by using sigmoid function compared to the prior art in softMax for.Its In, softMax needs the prediction result last to network to do normalization operation, for single labeling task, it is only necessary to select Label corresponding to maximum value is as final prediction result after normalization, and for multi-tag classification, each image The number of labels for being included be it is unknown, the label of accurate image can not be accurately predicted using softMax, in addition, The normalized result that will cause between different labels that operates of softMax influences each other, and influences the reversed biography of corresponding label loss It broadcasts.However the first labeling prediction result and the second labeling prediction result are calculated by sigmoid function, it obtains To classification value, it is positive when classification value shows that corresponding label is classified greater than 0.5, shows that corresponding label is classified less than 0.5 It is negative, to indicate whether without considering the problems of image tag number, and then can have by positive and negative containing label Effect avoids the problem that the result between different labels influences each other, so that obtained target prediction result is more accurate, it is right The precision of multi-tag classification is higher.

Multi-tag image classification method provided by the embodiment of the present invention, by the fisrt feature figure for extracting image to be processed Picture carries out the first dimension-reduction treatment to fisrt feature image and the first classification processing generates the first labeling prediction result, and Feature extraction is carried out to fisrt feature image, generates second feature image；To second feature image carry out the second dimension-reduction treatment with Second classification processing generates the second labeling prediction result；It is pre- according to the first labeling prediction result and the second labeling Survey the target prediction result that result determines image to be processed.In other words, the multi-tag image classification method in the embodiment of the present invention On the one hand it will be classified to obtain the first labeling prediction result by fisrt feature image as target prediction result A part will further be extracted second feature image from fisrt feature image, and be classified based on second feature image, be obtained Another part to the second labeling prediction result as target prediction result, to pass through two parallel classification processings Branch obtains two classification results, and then comprehensively considers two classification results and obtain target classification result；On the other hand, because from Fisrt feature image further extracts second feature image, and is classified based on second feature image, so by further Characteristics of image is extracted to solve the problems, such as the classification of multi-tag without noticing of targets multiple and different in image, by two above side Face improves the precision of multi-tag image classification.

In order to more intuitively embody the beneficial effect of the multi-tag image classification method in the embodiment of the present invention, spy sends out this In the industry cycle multi-tag nicety of grading is real on current large-scale image AUTHORITATIVE DATA collection MS-COCO for classification method in bright embodiment Result is tested to compare with existing method, as shown in Table 1:

Table one

Wherein, for the validity of better balancing method, seven indexs are provided in table one as measurement standard, respectively It is OP (overall precision), OR (overall recall), OF (overall F1), CP (pre-class Precision), OR (pre-class recall), OF (pre-class F1), MAP (mean average precision).Wherein, the indices in table one are the bigger the better, and the calculation formula about the indices in table one is such as Shown in lower.

Wherein, C indicates the classification number to be predicted, and i indicates index,Indicate the quantity of the i-th classification of prediction pair, Indicate the quantity of the i-th classification of prediction,Indicate the quantity of the i-th all classifications.

Wherein, WARP comes from paper " deepconvolution ranking for multilabel image annotation"；CNN-RNN (Convolutional Neural Network-Recurrent Neural Networks, Convolutional neural networks-Recognition with Recurrent Neural Network) come from paper " aunified framework for multi-label image classification"；RLSD comes from paper " multi-label image classification with regional latent semantic dependencies"；RDAR comes from paper " multi-label image recognition by recurrently discovering attentional regions"；Resnet101 and resent107 be all using The result of resent network；Resnet101-semantic and ResNet-SRN-att, ResNet-SRN are papers " learning spatial regularization with image-level supervisions for multi- Label image classification " proposed in three methods.

Wherein, OF and CF is more important index, and MAP is mostly important index.It can be intuitive to pass through table one Find out that the numerical value of obtained OF, CF and MAP index of multi-tag image classification method provided by the embodiment of the present invention is opposite in ground It is maximum value for the obtained result of the method for the prior art, so relative to existing technologies, implementing through the invention Multi-tag image classification method in example can effectively improve the precision of multi-tag image classification.

Second embodiment

Corresponding to the multi-tag image classification method in first embodiment, Fig. 3 is shown using shown in first embodiment The one-to-one multi-tag image classification device of multi-tag image classification method.As shown in figure 3, the multi-tag image classification dress Setting 400 includes the first extraction module 410, first processing module 420, the second extraction module 430, Second processing module 440 and the Three processing modules 450.Wherein, the first extraction module 410, first processing module 420, the second extraction module 430, second processing mould The realization function of block 440 and third processing module 450 is gathered with step corresponding in first embodiment to be corresponded, to avoid repeating, The present embodiment is not described in detail one by one.

First extraction module 410, for extracting the fisrt feature image of image to be processed.

First processing module 420, for carrying out the first dimension-reduction treatment and the generation of the first classification processing to fisrt feature image First labeling prediction result, the first labeling prediction result are used to indicate the first classification results of each class label.

Optionally, the dimension of fisrt feature image is the first dimension, and first processing module 420 is also used to to the first spy It levies image and carries out pond processing, to obtain the feature vector of the second dimension, the second dimension is less than the first dimension；By the way that second is tieed up The full articulamentum of feature vector input first of degree carries out classification processing, generates the first labeling prediction result.

Wherein, pond processing is carried out to the fisrt feature image, to obtain the feature vector of the second dimension, comprising: root Default pond function is determined according to maximum pond function and average pond function；Fisrt feature image is done by default pond function Pondization processing, obtains the feature vector of the second dimension.

Second extraction module 430 generates second feature image for carrying out feature extraction to fisrt feature image.

Optionally, second extraction module 430 is also used to the class label number based on default classification to fisrt feature figure As carrying out feature extraction, the second feature image of third dimension is generated, third dimension is equal to class label number and preset constant Product, and the third dimension is less than the first dimension of the fisrt feature image.

Second processing module 440, for carrying out the second dimension-reduction treatment and the generation of the second classification processing to second feature image Second labeling prediction result, the second labeling prediction result are used to indicate the second classification results of each class label.

Optionally, Second processing module 440 are also used to carry out pond processing to second feature image, to obtain and classification The feature vector of number of tags identical dimensional；By the way that the second full articulamentum will be inputted with the feature vector of class label number identical dimensional Classification processing is carried out, the second labeling prediction result is generated.

Third processing module 450, for true according to the first labeling prediction result and the second labeling prediction result The target prediction result of fixed image to be processed.

Optionally, third processing module 450 is also used to determine that the first labeling prediction result and the second labeling are pre- The average value of result is surveyed as target prediction result.

Further, multi-tag image classification device further includes accuracy rate computing module.Accuracy rate computing module is used for base The accuracy rate of target prediction result is determined in preset rules.

Optionally, accuracy rate computing module can be also used for determining mesh based on sigmoid function and cross entropy loss function Mark the corresponding penalty values of prediction result；Accuracy rate is determined according to penalty values.

Wherein, the corresponding penalty values of target prediction result, packet are determined based on sigmoid function and cross entropy loss function It includes: corresponding first classification value of target prediction result is calculated according to sigmoid function；First is calculated according to cross entropy loss function The corresponding penalty values of classification value.

Further, multi-tag image classification device further includes fourth processing module.Fourth processing module is used for based on friendship Fork entropy loss function determines the second penalty values corresponding to the second labeling prediction result.

3rd embodiment

As shown in figure 4, being the schematic diagram of terminal device 300.The terminal device 300 includes memory 302, processor 304 and it is stored in the computer program 303 that can be run in the memory 302 and on the processor 304, the calculating The multi-tag image classification method in first embodiment is realized when machine program 303 is executed by processor 304, to avoid weight Multiple, details are not described herein again.Alternatively, realizing more marks described in second embodiment when the computer program 303 is executed by processor 304 The function of each model/unit in image classification device is signed, to avoid repeating, details are not described herein again.

Illustratively, computer program 303 can be divided into one or more module/units, one or more mould Block/unit is stored in memory 302, and is executed by processor 304, to complete the present invention.One or more module/units It can be the series of computation machine program instruction section that can complete specific function, the instruction segment is for describing computer program 303 Implementation procedure in terminal device 300.For example, first the mentioning of being divided into second embodiment of computer program 303 Modulus block 410, first processing module 420, the second extraction module 430, Second processing module 440 and third processing module 450, respectively The concrete function of module will not repeat them here as described in the first embodiment or the second embodiment.

Terminal device 300 can be desktop PC, notebook, palm PC and cloud server etc. and calculate equipment.

Wherein, memory 302 may be, but not limited to, random access memory (Random Access Memory, RAM), read-only memory (Read Only Memory, ROM), programmable read only memory (Programmable Read- Only Memory, PROM), erasable read-only memory (Erasable Programmable Read-Only Memory, EPROM), electricallyerasable ROM (EEROM) (Electric Erasable Programmable Read-Only Memory, EEPROM) etc..Wherein, memory 302 is for storing program, and the processor 304 is after receiving and executing instruction, described in execution The method of program, the flow definition that aforementioned any embodiment of the embodiment of the present invention discloses can be applied in processor 304, or It is realized by processor 304.

Processor 304 may be a kind of IC chip, the processing capacity with signal.Above-mentioned processor 304 can To be general processor, including central processing unit (Central Processing Unit, CPU), network processing unit (Network Processor, NP) etc.；It can also be digital signal processor (Digital Signal Processor, DSP), dedicated integrated Circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field- Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic, Discrete hardware components.It may be implemented or execute disclosed each method, step and the logic diagram in the embodiment of the present invention.It is general Processor can be microprocessor or the processor is also possible to any conventional processor etc..

It is understood that structure shown in Fig. 4 is only a kind of structural schematic diagram of terminal device 300, terminal device 300 It can also include than more or fewer components shown in Fig. 4.Each component shown in Fig. 4 can use hardware, software or its group It closes and realizes.

Fourth embodiment

The embodiment of the present invention also provides a kind of storage medium, and instruction is stored on the storage medium, when described instruction exists The multi-tag image point in first embodiment is realized when running on computer, when the computer program is executed by processor Class method, to avoid repeating, details are not described herein again.Alternatively, realizing that second implements when the computer program is executed by processor The function of each model/unit in the example multi-tag image classification device, to avoid repeating, details are not described herein again.

Through the above description of the embodiments, those skilled in the art can be understood that the present invention can lead to Hardware realization is crossed, the mode of necessary general hardware platform can also be added to realize by software, based on this understanding, this hair Bright technical solution can be embodied in the form of software products, which can store in a non-volatile memories In medium (can be CD-ROM, USB flash disk, mobile hard disk etc.), including some instructions are used so that computer equipment (can be with It is personal computer, server or the network equipment etc.) method that executes each implement scene of the present invention.

The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field For art personnel, the invention may be variously modified and varied.All within the spirits and principles of the present invention, made any to repair Change, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.It should also be noted that similar label and letter exist Similar terms are indicated in following attached drawing, therefore, once being defined in a certain Xiang Yi attached drawing, are then not required in subsequent attached drawing It is further defined and explained.

Claims

1. a kind of multi-tag image classification method characterized by comprising

Extract the fisrt feature image of image to be processed；

First dimension-reduction treatment is carried out to the fisrt feature image and the first classification processing generates the first labeling prediction result, The first labeling prediction result is used to indicate the first classification results of each class label；

Feature extraction is carried out to the fisrt feature image, generates second feature image；

Second dimension-reduction treatment is carried out to the second feature image and the second classification processing generates the second labeling prediction result, The second labeling prediction result is used to indicate the second classification results of each class label；

The image to be processed is determined according to the first labeling prediction result and the second labeling prediction result Target prediction result.

2. the method according to claim 1, wherein the characteristic dimension of the fisrt feature image is the first dimension Degree, it is described that first dimension-reduction treatment and generation the first labeling prediction of the first classification processing are carried out to the fisrt feature image As a result, comprising:

Pond processing is carried out to the fisrt feature image, to obtain the feature vector of the second dimension, second dimension is less than First dimension；

By the way that the full articulamentum of described eigenvector input first of second dimension is carried out classification processing, described first is generated Labeling prediction result.

3. according to the method described in claim 2, it is characterized in that, described carry out pond Hua Chu to the fisrt feature image Reason, to obtain the feature vector of the second dimension, comprising:

Default pond function is determined according to maximum pond function and average pond function；

Pondization processing is done to the fisrt feature image by the default pond function, to obtain described in second dimension Feature vector.

4. the method according to claim 1, wherein described propose fisrt feature image progress feature It takes, generates second feature image, comprising:

Feature extraction is carried out to the fisrt feature image based on the class label number of default classification, generates the described of third dimension Second feature image, the third dimension is equal to the product of the class label number and preset constant, and the third dimension is small In the first dimension of the fisrt feature image.

5. according to the method described in claim 4, it is characterized in that, described carry out the second dimensionality reduction to the second feature image Processing generates the second labeling prediction result with the second classification processing, comprising:

Pond processing is carried out to the second feature image, to obtain the feature vector with the class label number identical dimensional；

By the way that classification processing will be carried out with the full articulamentum of the described eigenvector of class label number identical dimensional input second, Generate the second labeling prediction result.

6. the method according to claim 1, wherein it is described according to the first labeling prediction result with The second labeling prediction result determines target prediction result, comprising:

Determine the average value of the first labeling prediction result and the second labeling prediction result as the mesh Mark prediction result.

7. the method according to claim 1, wherein the method also includes:

The accuracy rate of the target prediction result is determined based on preset rules.

8. the method according to the description of claim 7 is characterized in that described determine the target prediction knot based on preset rules The accuracy rate of fruit, comprising:

The corresponding penalty values of the target prediction result are determined based on sigmoid function and cross entropy loss function；

The accuracy rate is determined according to the penalty values.

9. according to the method described in claim 8, it is characterized in that, it is described based on sigmoid function with intersect entropy loss letter Number determines the corresponding penalty values of the target prediction result, comprising:

Corresponding first classification value of the target prediction result is calculated according to the sigmoid function；

The corresponding penalty values of first classification value are calculated according to the cross entropy loss function.

10. a kind of multi-tag image classification device characterized by comprising

First extraction module, for extracting the fisrt feature image of image to be processed；

First processing module, for carrying out the first dimension-reduction treatment and the first classification processing generation first to the fisrt feature image Labeling prediction result, the first labeling prediction result are used to indicate the first classification results of each class label；

Second extraction module generates second feature image for carrying out feature extraction to the fisrt feature image；

Second processing module, for carrying out the second dimension-reduction treatment and the second classification processing generation second to the second feature image Labeling prediction result, the second labeling prediction result are used to indicate the second classification knot of each class label Fruit；

Third processing module, for true according to the first labeling prediction result and the second labeling prediction result The target prediction result of the fixed image to be processed.

11. a kind of terminal device characterized by comprising memory, processor and storage are in the memory and can be The computer program run on the processor, the processor realize such as claim 1 to 9 when executing the computer program The step of described in any item multi-tag image classification methods.

12. a kind of storage medium, which is characterized in that instruction is stored on the storage medium, when described instruction on computers When operation, so that the computer executes multi-tag image classification method as described in any one of claim 1 to 9.