CN107563418A

CN107563418A - A kind of picture attribute detection method based on area sensitive score collection of illustrative plates and more case-based learnings

Info

Publication number: CN107563418A
Application number: CN201710714926.0A
Authority: CN
Inventors: 何小海; 陈祥; 张�杰; 卿粼波; 苏婕; 王正勇; 滕奇志
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2017-08-19
Filing date: 2017-08-19
Publication date: 2018-01-09

Abstract

The invention discloses a kind of picture attribute detection method based on area sensitive score collection of illustrative plates and more case-based learnings.Comprise the following steps：Input picture is converted into RSSM characteristic spectrums by convolutional neural networks, RSSM characteristic spectrums are converted to 1000 × 10 × 10 MIL characteristic spectrums by RSSM combination layers again, MIL characteristic spectrums finally are input into more case-based learning MIL Internets obtains 1000 × 1 attribute probability vector.Attribute detection method of the present invention based on area sensitive score collection of illustrative plates and more case-based learnings is obviously improved compared with method detection accuracy rate before, under square one, accuracy rate based on CNN models and FCN MIL models can only achieve 30.8% and 34.0%, and the method proposed by the present invention based on sensitizing range score collection of illustrative plates and more case-based learnings can reach 42.1%.In addition the method based on sensitizing range score collection of illustrative plates and more case-based learnings can detect 1000 attribute of picture, can be competent at the needs of the description of in general picture video and scene understanding substantially than in general attribute detection method more comprehensively.

Description

It is a kind of to be detected based on area sensitive score collection of illustrative plates and the picture attribute of more case-based learnings Method

Technical field

The present invention devises a kind of picture attribute detection method based on area sensitive score collection of illustrative plates and more case-based learnings, relates to And deep learning, technical field of computer vision.

Background technology

The problems such as object identification at present, picture classification, with the fast development of deep learning, achieves good effect, this Two class problems can be classified as single multi- labeling problem, i.e., each Target Photo has the label of certain amount, problems Final purpose is exactly to be correctly obtained these labels, and these labels have a common ground, and exactly belonging to noun etc. being capable of hypostazation Vocabulary.And in picture semantic acquisition of information problem, actually also in the presence of obtain the object that is included of picture and with object phase The demand of the vocabulary such as the verb of pass, adjective, noun, strictly speaking falls within multi-tag classification problem, but in order to other lists/ Multi-tag classification problem is made a distinction, and this problem is referred to as into picture attribute detection；Picture attribute detection is gone back except These characteristics There is some other feature：Weakly supervised study, number of tags are not fixed, vocabulary attribute classification is extensive etc..

Because picture attribute detection has these features, particularly its vocabulary attribute classification for being studied is very extensive, so Even if the problems such as photo current classification, object identification has been obtained for good solution, picture attribute detection but still has tired Difficult and challenge.Meanwhile picture semantic attribute is incorporated into each picture descriptive model, it is necessary to ensure picture semantic attribute just True property and validity.

Existing picture attribute detection method mainly has following several：1st, based on convolutional neural networks (Convolutional Neural Network, CNN) picture attribute detection, picture is input in convolutional neural networks and obtains characteristic spectrum, then will Characteristic spectrum by connecting and sigmoid Function Mappings, one attribute probability vector of output entirely；2nd, based on full convolutional network and more The attribute of case-based learning (Fully Convolutional Networks-Multi Instance Learning, FCN-MIL) Detection, similar with the attribute detection based on CNN, the CNN for first passing through full convolution obtains characteristic spectrum, then inputs characteristic spectrum To more case-based learning networks, final attribute probability vector is obtained；3rd, the conventional pictures based on arest neighbors (KNN) and Ranking Attribute detection method.

Based on KNN and Ranking it is to belong to traditional method in above several method, accuracy rate is not high；And it is based on CNN Picture attribute detection method although employ the method for deep learning, but the global characteristics of picture are only considered, due to picture Attribute actually only accounts for the very small part of picture in many cases, and a same part for picture may include multiple attributes, This reduces the validity that CNN models detect for picture attribute to a certain extent；Picture attribute detection based on FCN-MIL Method has the characteristic that piecemeal identification is carried out to picture so that the attribute detection method can be in the case where paying close attention to the picture overall situation Focus on picture local features simultaneously, the information for making full use of picture to include, thus accuracy rate is lifted compared with the method based on CNN It is many, but because algorithm has the characteristics of piecemeal identification in itself, the original image for causing the upper each value of subgraph spectrum to associate is big The small and size of MIL characteristic spectrums is with there is direct relation, so if characteristic spectrum is undersized, the original that each value associates thereon Beginning picture size will be excessive, so as to influence feature extraction effect of the algorithm to picture regional area.Therefore this several method Attribute Effect on Detecting is all not fully up to expectations.

The content of the invention

To there is provided a kind of accuracy rate higher based on area sensitive score collection of illustrative plates and more examples to solve the above problems by the present invention Learn the picture attribute of (Region-sensitive Score Maps-Multi Instance Learning, RSSM-MIL) Detection method.In the case where characteristic spectrum size is constant, the present invention more careful can be handled locally picture, can To offset the undersized adverse effect brought of characteristic spectrum to a certain extent, more preferable picture attribute Effect on Detecting is obtained.

The present invention is achieved through the following technical solutions above-mentioned purpose：

A kind of higher picture attribute detection method based on area sensitive score collection of illustrative plates and more case-based learnings of accuracy rate, bag Include following steps：

Step (1)：Original image is input to convolutional neural networks and obtains k²The characteristic pattern of × 1000 10 × 10 sizes It is referred to as RSSM characteristic spectrums by spectrum, the present invention.

Step (2)：The combination layer that RSSM characteristic spectrums are input to RSSM is combined to obtain new 1000 × 10 × 10 It is referred to as MIL characteristic spectrums by the characteristic spectrum of size, the present invention.

Step (3)：MIL characteristic spectrums are input to MIL algorithm layers and obtain the attribute probability vector of 1000 × 1 dimensions.

Step (4)：By setting threshold value to handle the attribute probability vector of 1000 dimensions, more than all properties of threshold value It is considered as attribute possessed by picture.

Above-mentioned steps are the design of part of neural network, and the idiographic flow of practical application should be the same as the side of other supervised learnings Method is the same, it is necessary to pre-processed first to training data, and original image and label are input to above-mentioned network together afterwards In, mode is declined etc. by gradient and is trained to obtain model, the first stress model during actual test, then by test pictures It is input in the deep neural network for having loaded parameter, the property value of output is mapped to real label value, you can Obtain attribute possessed by picture.

Convolutional neural networks described in step (1) are not singly to refer to convolutional layer, but the convolutional Neural net in universal significance Network, i.e. an integrated network including convolutional layer, pond layer and active coating.It is common as AlexNet, VGG16, VGG19, CaffeNet etc., last full articulamentum is removed, small adjustment have also been made to convolution pond layer above so that original image The characteristic spectrum of 10 × 10 sizes is obtained after the processing of convolutional layer.

More case-based learning MIL networks described in step (3), its attribute w appear in the Probability p in picture i_i ^wBy below equation Obtain：

Wherein j is the jth block regional area after picture piecemeal.p_ij ^wJth block region includes the general of attribute w in representative picture i Rate.

Threshold value described in step (4) is rule of thumb set, and is typically set to 0.5.It can reach one by testing the threshold value Individual relatively good attribute Effect on Detecting.

The main contents of the present invention are to propose area sensitive score collection of illustrative plates this concept, due to the attribute based on MIL The input feature vector collection of illustrative plates of detection method is all smaller, and usually 10 × 10, it must directly be attended the meeting by general convolutional neural networks More local message is lost, and after being combined by way of area sensitive score collection of illustrative plates, input the spy of MIL algorithm layers Sign collection of illustrative plates can retain more local messages.

It is proposed by the present invention under square one (convolutional neural networks structure uses same structure, same test collection) Picture attribute detection method average recognition rate based on RSSM-MIL is than the picture attribute detection method based on CNN and based on FCN- MIL picture attribute detection method is significantly improved.

Brief description of the drawings

Fig. 1 is picture attribute detection schematic diagram

Fig. 2 is the flow chart of the picture attribute detection method of the invention based on area sensitive score collection of illustrative plates and more case-based learnings

Fig. 3 is present invention selection VGG19 as convolutional neural networks and details flow chart during selection k=2

To the combination figure of sensitizing range characteristic spectrum when Fig. 4 is present invention selection k=2

Embodiment

The invention will be further described below in conjunction with the accompanying drawings：

Fig. 1 is the schematic diagram of picture attribute detection.Therefrom it may be seen that the attribute of picture not only includes noun, also have Numerous part of speech such as verb, adjective, measure word.

In Fig. 2, a kind of picture attribute detection method based on area sensitive score collection of illustrative plates and more case-based learnings, including it is following Step：

Step (1)：Original image is input to convolutional neural networks and obtains RSSM characteristic spectrums.RSSM characteristic spectrums are k² The characteristic spectrum of × 1000 10 × 10.K is a parameter of RSSM combination layers, is an integer more than 1.

Step (2)：RSSM characteristic spectrums are obtained into MIL characteristic spectrums by RSSM combination layer.Per k²Individual RSSM features Collection of illustrative plates is combined as 1 width MIL characteristic spectrums according to certain rule.The MIL characteristic spectrums finally obtained are 1000 × 10 × 10 Characteristic spectrum.

Step (3)：MIL characteristic spectrums are input to MIL algorithm layers and obtain attribute probability vector.Attribute probability vector is 1000 × 1 probability vector.

Fig. 3 be using VGG19 as full convolutional network, choose k=2 when details flow chart.Concretely comprise the following steps：

(1) original picture input VGG19 networks are obtained into RSSM characteristic spectrums.The picture that picture is 3 × 565 × 565 is inputted, RSSM characteristic spectrums are 4000 × 10 × 10 characteristic spectrum.

(2) RSSM characteristic spectrums are obtained into MIL characteristic spectrums by RSSM combination layer.RSSM characteristic spectrums are passed through RSSM combination is combined, and every 4 width RSSM characteristic spectrums merge into a width MIL characteristic spectrums, the size because obtained from For 1000 × 10 × 10 MIL characteristic spectrums.

(3) MIL characteristic spectrums are input to MIL algorithm layers and obtain attribute probability vector.MIL characteristic spectrums are through excessive example The attribute probability vector of one 1000 × 1 is calculated in study.

Under square one (convolutional neural networks structure uses VGG19, same test collection), the picture category based on CNN Property detection method average recognition rate (Average Precision, AP) be 30.8%, picture attribute based on FCN-MIL is visited The average recognition rate of survey method is 34.0%, and the picture attribute detection method proposed by the present invention based on RSSM-MIL is averagely known Rate is not 42.1%, and two methods are significantly improved earlier above.

Fig. 4 is the schematic diagram that RSSM combination layers are combined to RSSM characteristic spectrums, exemplified by taking k=2.As k=2, every 4 Width RSSM characteristic spectrums are combined into a width MIL characteristic spectrums.As illustrated, existed with the sliding window of one 2 × 2 with 2 for step-length Enter line slip, the value pair of the MIL characteristic spectrum sliding window upper left positions of generation on characteristic spectrum from top to bottom from left to right The figure of the 1st width RSSM characteristic spectrum upper left positions is answered, the value of the MIL characteristic spectrum sliding window upper right Angle Positions of generation corresponds to The figure of 2nd width RSSM characteristic spectrum upper right Angle Positions, the value of the MIL characteristic spectrum sliding windows lower-left Angle Position of generation corresponding the The figure of 3 width RSSM characteristic spectrums lower-left Angle Positions, the value the corresponding 4th of the MIL characteristic spectrum sliding window lower right positions of generation The figure of width RSSM characteristic spectrum lower right positions.Combination during k=3 or even k=n can similarly be obtained.In characteristic spectrum In the case that size is constant, RSSM more careful can be handled locally picture, can offset feature to a certain extent The undersized adverse effect brought of collection of illustrative plates.

Claims

A kind of 1. picture attribute detection method based on area sensitive score collection of illustrative plates and more case-based learnings, it is characterised in that including with Lower step：

Step 1：Original image is input to convolutional neural networks and obtains RSSM characteristic spectrums, obtained RSSM characteristic spectrums are k² The characteristic spectrum of × 1000 10 × 10；

Step 2：Will be per k by the combination of area sensitive score collection of illustrative plates²Width RSSM characteristic spectrums are combined as a width MIL features Collection of illustrative plates, MIL characteristic spectrums are the characteristic spectrum of 1000 × 10 × 10 sizes；

Step 3：MIL characteristic spectrums are input to more case-based learning networks and obtain the attribute probability vector of picture, attribute probability to Take measurements as 1000 × 1；

Step 4：By setting threshold value to handle the attribute probability vector of 1000 dimensions, recognized more than all properties of threshold value To be attribute possessed by picture.
2. the convolutional neural networks described in claim 1 should include but is not limited to AlexNet, CaffeNet, GoogleNet, VGG16, VGG19 etc. conventional convolutional neural networks structure, the convolutional layer only with its first half and pond layer etc. here, Do not include full articulamentum below, be k to make its output characteristic collection of illustrative plates size²× 1000 × 10 × 10, it is necessary to network parameter Make corresponding modification.
3. the k described in claim 1 and 2 is the integer more than 1, usual k values are 2 or 3.
4. the RSSM described in claim 1 and 2 is area sensitive score collection of illustrative plates (Region-sensitive Score Maps), Its essence is by k²Width characteristic spectrum is combined as the new characteristic spectrum of a width by specific combination, in characteristic spectrum size In the case of constant, RSSM more careful can be handled locally picture, can offset characteristic spectrum to a certain extent The undersized adverse effect brought, the input energy as more case-based learning networks reach more preferable attribute Effect on Detecting.
5. the threshold value shown in claim 4 is usually arranged as empirical value 0.5, through test the threshold value can reach one it is relatively good Picture attribute Effect on Detecting.