CN110119688A

CN110119688A - A kind of Image emotional semantic classification method using visual attention contract network

Info

Publication number: CN110119688A
Application number: CN201910311521.1A
Authority: CN
Inventors: 杨巨峰; 折栋宇; 姚星旭
Original assignee: Nankai University
Current assignee: Nankai University
Priority date: 2019-04-18
Filing date: 2019-04-18
Publication date: 2019-08-13

Abstract

The invention discloses a kind of Image emotional semantic classification methods using visual attention contract network, belong to technical field of computer vision.The purpose of this method is the regional area for detecting to cause emotion in picture using Weakly supervised study, extracts the corresponding partial-depth feature in emotion region, then and by it with global depth feature merges, form final feature vector, the classification for emotion picture.Visual attention collaboration convolutional neural networks therein mainly include shared shallow-layer convolutional layer, and the Liang Ge branch of two kinds of tasks is carried out simultaneously, it is respectively used to generate emotion regional distribution chart and the richer vector of generative semantics information, is then sent through classifier and is identified.Image emotional semantic region detection and Image emotional semantic classification task are integrated in a unified depth network by the technology, it realizes and trains end to end, and the other Emotion tagging information of picture level is only needed, rather than the rectangle frame of pixel scale marks, thus alleviate the burden largely marked.

Description

A kind of Image emotional semantic classification method using visual attention contract network

Technical field

The invention belongs to technical field of computer vision, and can simultaneously global information and local message be used for by being related to one kind The visual attention contract network of Image emotional semantic classification.

Background technique

Image emotional semantic classification causes more and more concerns in computer vision at present, carries out emotion automatically to picture Assessment has important application in fields such as education, environment, business.With the development of deep learning, depth network is answered In the work for having used forecast image emotion, however, still there is many challenges to annoying this more abstract work: 1) due to The mankind have in identification affective process with very strong subjectivity, and Image emotional semantic classification has more than traditional visual identity task Challenge；2) if better table will be obtained in identification mission by being capable of providing markup information more more detailed than picture tag It is existing.But since the accurate mark mark other compared to picture level of picture region is more laborious, and viewer is for different The emotion that region generates is also different, so mark is difficult to realize in detail and accurately.

Currently, proposing many methods for Image emotional semantic prediction.In early time, by the theories of psychology and Art Principle It inspires, many methods devise the combination of different manual features.Machajdik et al. defined in document 1 low in 2010 The combination of layer feature goes to indicate that affective content, such as color, texture, ingredient, Zhao et al. proposed in document 2 in 2016 By the hypergraph model of multitask for personal building emotion forecasting system, the system can consider simultaneously personal life background, Social environment, location information and past mood etc. carry out emotion prediction, and disclose IESN data set, this is for solution Emotion subjectivity problem starts sex work.In order to solve the problems, such as training data deficiency, You et al. is in 2015 in document 3 In joined the weak flag data of a large amount of network, and the relatively high data of confidence level are screened iteratively to network according to prediction result It is trained, target data set fine tuning model parameter is finally reused, so that model capability is improved.In order to solve emotion mould The problem of paste property, Yang et al. were proposed in document 4 in 2017 and are utilized the study of progress labeling and label Distributed learning Multitask convolutional neural networks go extract depth characteristic.The existing method based on convolutional neural networks is mostly from whole picture Depth characteristic is extracted, seldom considers the effect that local feature predicts emotion.Sun et al. was based on mesh in document 5 in 2016 It marks proposed algorithm and finds emotion region, and classify in conjunction with depth characteristic.However, this method is suboptimum, because of target Proposed algorithm and prediction work are separation, and unlike the region of object is likely to be left out at the very start.

As deep learning is succeeded in object recognition task, many Weakly supervised convolutional neural networks methods are used to Carry out object detection task.2016, Zhou et al. was averaged pond in document 6 using the overall situation after the last one convolutional layer The response message of layer processing particular category, while Durand et al. proposes WILDCAT method, study and difference in document 7 The relevant indication of multiple local feature of classification.In view of target information, Zhu et al. proposes " soft proposal in document 8 Network ", which is generated, to be recommended region and region and characteristic pattern will be recommended to merge fusion picture feature.

The development in above-mentioned field excites our inspiration, therefore carries out emotion detection and feelings simultaneously we have proposed a kind of Global information and local message fusion are classified, and realized end-to-end by the visual attention contract network for feeling classification Training.

Document:

1、Affective image classification using features inspired by psychology and art theory.In ACM MM,2010.

2、Predicting personalized emotion perceptions of social images.In ACM MM,2016.

3、Robust image sentiment analysis using progressively trained and domain transferred deep networks.In AAAI,2015.

4、Joint image emotion classification and distribution learning via deep convolutional neural network.In IJCAI,2017.

5、Discovering affective regions in deep convolutional neural networks for visual sentiment prediction.In ICME,2016.

6、Learning deep features for discriminative localization.In CVPR, 2016.

7、Wildcat:Weakly supervised learning of deep convnets for image classification,pointwise localization and segmentation.In CVPR,2017.

8、Soft proposal networks for weakly supervised object localization.In ICCV,2017.

Summary of the invention

The technical problem to be solved in the invention is to be detected in the case where only picture level label using Weakly supervised study The regional area for causing emotion in picture out, extracts the corresponding partial-depth feature in emotion region, then and by itself and the overall situation Depth characteristic merges, and forms final feature vector, the classification for emotion picture.

In order to achieve the object of the present invention, we realize by following technical scheme:

A. emotion picture data are input to full convolution net after the pretreatment operation of data enhancing such as overturning, shearing In network, convolution characteristic response figure is generated；

B. the characteristic response characteristic pattern with suitable space resolution ratio generated in a is sent into two network branches, Middle detection branches are weighted summation to the information of emotional semantic classification each in characteristic pattern using across space pondization strategy, generate final Emotion distribution map；

C. in classification branch by the depth characteristic generated in a and the emotion regional distribution chart generated in b carry out element it Between cooperating, by assign weight in the way of the prominent main region for generating emotion；

D. global information and local message are blended to form the feature vector for being rich in semantic information；

E. feature vector is sent into classifier, emotional semantic classification is carried out to picture.

In b step, for every a kind of affective tag, have specific detector go the characteristic response figure that a is generated into Row detection, associated region just will appear high response, then carry out the testing result of all categories to assign weight addition, mould Shape parameter can be updated according to the loss function of this branch, until convergence.

Further, the network model for being used for region detection and the network model for being used to classify are integrated to one in the present invention In a framework, realizes and train end to end.

Further, the emotion regional distribution chart generated in the present invention for region detection branch, what is fully considered is each The other emotion distribution map of every type is finally assigned weight and is added the final emotion of acquisition by characteristic information corresponding to kind emotional category Distribution map, and weight is constantly adjusted according to classification results, so that the corresponding region of main emotion in emotion picture distribution map It can effectively be protruded.

Further, the class label that picture is only needed in training process does not need the markup information to emotion region, greatly Reduce the burden of labeled data greatly.

Further, global information be added with local message in the present invention and obtain final feature, in prominent weight While wanting information, the loss of information is avoided.

The invention has the benefit that this method can detect in picture in the case where only picture tag and cause feelings The region of sense extracts the corresponding local feature in emotion region, and it is merged with global depth feature, is merged The feature vector of global information and local message improves last classifying quality, be more than on mainstream data collection it is existing most Advanced method.This method can simply move to the enterprising market sense classification task of a variety of convolutional neural networks, it is only necessary to will Full articulamentum is changed to full convolutional layer, and the size of learning parameter and training data batch requires to meet selected convolutional neural networks knot The demand of structure.Generally speaking, this method is that Image emotional semantic classification task proposes a completely new solution, it is believed that this method On other convolutional neural networks and affection data collection, there can be good performance.

Detailed description of the invention

The present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments:

Fig. 1 is the flow chart using the Image emotional semantic classification method of visual attention contract network.

Fig. 2 is the schematic diagram using the Image emotional semantic classification method of visual attention contract network.

Wherein: k: the corresponding characteristic pattern quantity of every class emotion；C: the classification number of emotion；The size of characteristic pattern is n*w*h； MOF indicates that emotion distribution map M and feature F merges.

Fig. 3 is the schematic diagram of emotion regional distribution chart generating process.

Specific embodiment

Referring to Fig.1, it indicates the flow chart of the Image emotional semantic classification method using visual attention contract network, is indicated in figure The step of are as follows:

A. it is sent into model after image data being carried out the operations such as resetting size, data enhancing, generates characteristic response figure F.It is former Beginning model obtains initiation parameter after pre-training on large-scale dataset ImageNet.The last layer pond layer of model and Full articulamentum is by Liang Ge branch (detection branches, classification branch) replacement；

B. it for the detection in emotion region in picture, is obtained first with one 1 × 1 convolutional layer and is directed to each emotion class Other information, in C class emotion, every class emotion has k detector, and through handling, the characteristic pattern port number of acquisition is kC, then It takes mean value to obtain such emotion area distribution k characteristic pattern in same class, C emotion distribution map is finally assigned into weight v_c It is added, obtains final emotion distribution map M, weight is obtained by the operation of maximum value pondization, and formula indicates are as follows:

Use f_{C, i}Indicate the emotion distribution map that i-th of detector obtains in c class emotion.

C. emotion information distribution map M and characteristic response figure F are merged using Hadamard product, coding forms semantic The richer feature of information is sent into classifier by global mean value pondization, loses entropy function by Softmax and instructs to model Practice undated parameter.

Fig. 2 illustrates the schematic diagram of this method, wherein key problem, training process and system to algorithm in each stage Output and input very vivid description.The same meaning of Fig. 2 and Fig. 1 expression, only abstraction hierarchy is different, is mainly to aid in Understand various pieces in Fig. 1.

Fig. 3 illustrates the generating process of emotion regional distribution chart, for every one kind, has k characteristic response figure, for every K a kind of characteristic response figure generates the corresponding characteristic pattern of the category by the way of average pond, finally by all classes Another characteristic figure is assigned weight and is added, and final emotion regional distribution chart is obtained.

Claims

1. a kind of Image emotional semantic classification method using visual attention contract network, which is characterized in that this method includes as follows Step:

A. emotion picture data are input in full convolutional network after the pretreatment operation of data enhancing such as overturning, shearing, Generate convolution characteristic response figure；

B. the characteristic response characteristic pattern with suitable space resolution ratio generated in a is sent into two network branches, wherein examining It surveys branch and summation is weighted to the information of emotional semantic classification each in characteristic pattern using across space pondization strategy, generate final feelings Feel distribution map；

C. the depth characteristic generated in a and the emotion regional distribution chart generated in b are carried out between element in classification branch Cooperating, the prominent main region for generating emotion in the way of tax weight；

2. a kind of Image emotional semantic classification method using visual attention contract network according to claim 1, feature It is: the network model for being used for region detection and the network model for being used to classify is integrated in a framework, end is realized and arrives The training at end.

3. a kind of Image emotional semantic classification method using visual attention contract network according to claim 1, feature It is: for the emotion regional distribution chart of region detection branch generation, spy corresponding to each emotional category fully considered The other emotion distribution map of every type is finally assigned weight and is added the final emotion distribution map of acquisition by reference breath, and according to classification As a result weight is constantly adjusted, the corresponding region of main emotion in emotion picture distribution map is effectively protruded.

4. a kind of Image emotional semantic classification method using visual attention contract network according to claim 1, feature It is: only needs the class label of picture in training process, do not need the markup information to emotion region, greatly reduces mark The burden of data.

5. a kind of Image emotional semantic classification method using visual attention contract network according to claim 1, feature It is: global information is carried out to be added the final feature of acquisition with local message, while protrusion important information, avoids letter The loss of breath.

6. a kind of Image emotional semantic classification method using visual attention contract network according to claim 1, feature It is: in b step, for every a kind of affective tag, has specific detector that the characteristic response figure generated to a is gone to examine It surveys, associated region just will appear high response, then carry out the testing result of all categories to assign weight addition, model ginseng Number can be updated according to the loss function of this branch, until convergence.