CN110119688A - A kind of Image emotional semantic classification method using visual attention contract network - Google Patents

A kind of Image emotional semantic classification method using visual attention contract network Download PDF

Info

Publication number
CN110119688A
CN110119688A CN201910311521.1A CN201910311521A CN110119688A CN 110119688 A CN110119688 A CN 110119688A CN 201910311521 A CN201910311521 A CN 201910311521A CN 110119688 A CN110119688 A CN 110119688A
Authority
CN
China
Prior art keywords
emotion
emotional semantic
semantic classification
visual attention
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910311521.1A
Other languages
Chinese (zh)
Inventor
杨巨峰
折栋宇
姚星旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nankai University
Original Assignee
Nankai University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nankai University filed Critical Nankai University
Priority to CN201910311521.1A priority Critical patent/CN110119688A/en
Publication of CN110119688A publication Critical patent/CN110119688A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of Image emotional semantic classification methods using visual attention contract network, belong to technical field of computer vision.The purpose of this method is the regional area for detecting to cause emotion in picture using Weakly supervised study, extracts the corresponding partial-depth feature in emotion region, then and by it with global depth feature merges, form final feature vector, the classification for emotion picture.Visual attention collaboration convolutional neural networks therein mainly include shared shallow-layer convolutional layer, and the Liang Ge branch of two kinds of tasks is carried out simultaneously, it is respectively used to generate emotion regional distribution chart and the richer vector of generative semantics information, is then sent through classifier and is identified.Image emotional semantic region detection and Image emotional semantic classification task are integrated in a unified depth network by the technology, it realizes and trains end to end, and the other Emotion tagging information of picture level is only needed, rather than the rectangle frame of pixel scale marks, thus alleviate the burden largely marked.

Description

A kind of Image emotional semantic classification method using visual attention contract network
Technical field
The invention belongs to technical field of computer vision, and can simultaneously global information and local message be used for by being related to one kind The visual attention contract network of Image emotional semantic classification.
Background technique
Image emotional semantic classification causes more and more concerns in computer vision at present, carries out emotion automatically to picture Assessment has important application in fields such as education, environment, business.With the development of deep learning, depth network is answered In the work for having used forecast image emotion, however, still there is many challenges to annoying this more abstract work: 1) due to The mankind have in identification affective process with very strong subjectivity, and Image emotional semantic classification has more than traditional visual identity task Challenge;2) if better table will be obtained in identification mission by being capable of providing markup information more more detailed than picture tag It is existing.But since the accurate mark mark other compared to picture level of picture region is more laborious, and viewer is for different The emotion that region generates is also different, so mark is difficult to realize in detail and accurately.
Currently, proposing many methods for Image emotional semantic prediction.In early time, by the theories of psychology and Art Principle It inspires, many methods devise the combination of different manual features.Machajdik et al. defined in document 1 low in 2010 The combination of layer feature goes to indicate that affective content, such as color, texture, ingredient, Zhao et al. proposed in document 2 in 2016 By the hypergraph model of multitask for personal building emotion forecasting system, the system can consider simultaneously personal life background, Social environment, location information and past mood etc. carry out emotion prediction, and disclose IESN data set, this is for solution Emotion subjectivity problem starts sex work.In order to solve the problems, such as training data deficiency, You et al. is in 2015 in document 3 In joined the weak flag data of a large amount of network, and the relatively high data of confidence level are screened iteratively to network according to prediction result It is trained, target data set fine tuning model parameter is finally reused, so that model capability is improved.In order to solve emotion mould The problem of paste property, Yang et al. were proposed in document 4 in 2017 and are utilized the study of progress labeling and label Distributed learning Multitask convolutional neural networks go extract depth characteristic.The existing method based on convolutional neural networks is mostly from whole picture Depth characteristic is extracted, seldom considers the effect that local feature predicts emotion.Sun et al. was based on mesh in document 5 in 2016 It marks proposed algorithm and finds emotion region, and classify in conjunction with depth characteristic.However, this method is suboptimum, because of target Proposed algorithm and prediction work are separation, and unlike the region of object is likely to be left out at the very start.
As deep learning is succeeded in object recognition task, many Weakly supervised convolutional neural networks methods are used to Carry out object detection task.2016, Zhou et al. was averaged pond in document 6 using the overall situation after the last one convolutional layer The response message of layer processing particular category, while Durand et al. proposes WILDCAT method, study and difference in document 7 The relevant indication of multiple local feature of classification.In view of target information, Zhu et al. proposes " soft proposal in document 8 Network ", which is generated, to be recommended region and region and characteristic pattern will be recommended to merge fusion picture feature.
The development in above-mentioned field excites our inspiration, therefore carries out emotion detection and feelings simultaneously we have proposed a kind of Global information and local message fusion are classified, and realized end-to-end by the visual attention contract network for feeling classification Training.
Document:
1、Affective image classification using features inspired by psychology and art theory.In ACM MM,2010.
2、Predicting personalized emotion perceptions of social images.In ACM MM,2016.
3、Robust image sentiment analysis using progressively trained and domain transferred deep networks.In AAAI,2015.
4、Joint image emotion classification and distribution learning via deep convolutional neural network.In IJCAI,2017.
5、Discovering affective regions in deep convolutional neural networks for visual sentiment prediction.In ICME,2016.
6、Learning deep features for discriminative localization.In CVPR, 2016.
7、Wildcat:Weakly supervised learning of deep convnets for image classification,pointwise localization and segmentation.In CVPR,2017.
8、Soft proposal networks for weakly supervised object localization.In ICCV,2017.
Summary of the invention
The technical problem to be solved in the invention is to be detected in the case where only picture level label using Weakly supervised study The regional area for causing emotion in picture out, extracts the corresponding partial-depth feature in emotion region, then and by itself and the overall situation Depth characteristic merges, and forms final feature vector, the classification for emotion picture.
In order to achieve the object of the present invention, we realize by following technical scheme:
A. emotion picture data are input to full convolution net after the pretreatment operation of data enhancing such as overturning, shearing In network, convolution characteristic response figure is generated;
B. the characteristic response characteristic pattern with suitable space resolution ratio generated in a is sent into two network branches, Middle detection branches are weighted summation to the information of emotional semantic classification each in characteristic pattern using across space pondization strategy, generate final Emotion distribution map;
C. in classification branch by the depth characteristic generated in a and the emotion regional distribution chart generated in b carry out element it Between cooperating, by assign weight in the way of the prominent main region for generating emotion;
D. global information and local message are blended to form the feature vector for being rich in semantic information;
E. feature vector is sent into classifier, emotional semantic classification is carried out to picture.
In b step, for every a kind of affective tag, have specific detector go the characteristic response figure that a is generated into Row detection, associated region just will appear high response, then carry out the testing result of all categories to assign weight addition, mould Shape parameter can be updated according to the loss function of this branch, until convergence.
Further, the network model for being used for region detection and the network model for being used to classify are integrated to one in the present invention In a framework, realizes and train end to end.
Further, the emotion regional distribution chart generated in the present invention for region detection branch, what is fully considered is each The other emotion distribution map of every type is finally assigned weight and is added the final emotion of acquisition by characteristic information corresponding to kind emotional category Distribution map, and weight is constantly adjusted according to classification results, so that the corresponding region of main emotion in emotion picture distribution map It can effectively be protruded.
Further, the class label that picture is only needed in training process does not need the markup information to emotion region, greatly Reduce the burden of labeled data greatly.
Further, global information be added with local message in the present invention and obtain final feature, in prominent weight While wanting information, the loss of information is avoided.
The invention has the benefit that this method can detect in picture in the case where only picture tag and cause feelings The region of sense extracts the corresponding local feature in emotion region, and it is merged with global depth feature, is merged The feature vector of global information and local message improves last classifying quality, be more than on mainstream data collection it is existing most Advanced method.This method can simply move to the enterprising market sense classification task of a variety of convolutional neural networks, it is only necessary to will Full articulamentum is changed to full convolutional layer, and the size of learning parameter and training data batch requires to meet selected convolutional neural networks knot The demand of structure.Generally speaking, this method is that Image emotional semantic classification task proposes a completely new solution, it is believed that this method On other convolutional neural networks and affection data collection, there can be good performance.
Detailed description of the invention
The present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments:
Fig. 1 is the flow chart using the Image emotional semantic classification method of visual attention contract network.
Fig. 2 is the schematic diagram using the Image emotional semantic classification method of visual attention contract network.
Wherein: k: the corresponding characteristic pattern quantity of every class emotion;C: the classification number of emotion;The size of characteristic pattern is n*w*h; MOF indicates that emotion distribution map M and feature F merges.
Fig. 3 is the schematic diagram of emotion regional distribution chart generating process.
Specific embodiment
Referring to Fig.1, it indicates the flow chart of the Image emotional semantic classification method using visual attention contract network, is indicated in figure The step of are as follows:
A. it is sent into model after image data being carried out the operations such as resetting size, data enhancing, generates characteristic response figure F.It is former Beginning model obtains initiation parameter after pre-training on large-scale dataset ImageNet.The last layer pond layer of model and Full articulamentum is by Liang Ge branch (detection branches, classification branch) replacement;
B. it for the detection in emotion region in picture, is obtained first with one 1 × 1 convolutional layer and is directed to each emotion class Other information, in C class emotion, every class emotion has k detector, and through handling, the characteristic pattern port number of acquisition is kC, then It takes mean value to obtain such emotion area distribution k characteristic pattern in same class, C emotion distribution map is finally assigned into weight vc It is added, obtains final emotion distribution map M, weight is obtained by the operation of maximum value pondization, and formula indicates are as follows:
Use fC, iIndicate the emotion distribution map that i-th of detector obtains in c class emotion.
C. emotion information distribution map M and characteristic response figure F are merged using Hadamard product, coding forms semantic The richer feature of information is sent into classifier by global mean value pondization, loses entropy function by Softmax and instructs to model Practice undated parameter.
Fig. 2 illustrates the schematic diagram of this method, wherein key problem, training process and system to algorithm in each stage Output and input very vivid description.The same meaning of Fig. 2 and Fig. 1 expression, only abstraction hierarchy is different, is mainly to aid in Understand various pieces in Fig. 1.
Fig. 3 illustrates the generating process of emotion regional distribution chart, for every one kind, has k characteristic response figure, for every K a kind of characteristic response figure generates the corresponding characteristic pattern of the category by the way of average pond, finally by all classes Another characteristic figure is assigned weight and is added, and final emotion regional distribution chart is obtained.

Claims (6)

1. a kind of Image emotional semantic classification method using visual attention contract network, which is characterized in that this method includes as follows Step:
A. emotion picture data are input in full convolutional network after the pretreatment operation of data enhancing such as overturning, shearing, Generate convolution characteristic response figure;
B. the characteristic response characteristic pattern with suitable space resolution ratio generated in a is sent into two network branches, wherein examining It surveys branch and summation is weighted to the information of emotional semantic classification each in characteristic pattern using across space pondization strategy, generate final feelings Feel distribution map;
C. the depth characteristic generated in a and the emotion regional distribution chart generated in b are carried out between element in classification branch Cooperating, the prominent main region for generating emotion in the way of tax weight;
D. global information and local message are blended to form the feature vector for being rich in semantic information;
E. feature vector is sent into classifier, emotional semantic classification is carried out to picture.
2. a kind of Image emotional semantic classification method using visual attention contract network according to claim 1, feature It is: the network model for being used for region detection and the network model for being used to classify is integrated in a framework, end is realized and arrives The training at end.
3. a kind of Image emotional semantic classification method using visual attention contract network according to claim 1, feature It is: for the emotion regional distribution chart of region detection branch generation, spy corresponding to each emotional category fully considered The other emotion distribution map of every type is finally assigned weight and is added the final emotion distribution map of acquisition by reference breath, and according to classification As a result weight is constantly adjusted, the corresponding region of main emotion in emotion picture distribution map is effectively protruded.
4. a kind of Image emotional semantic classification method using visual attention contract network according to claim 1, feature It is: only needs the class label of picture in training process, do not need the markup information to emotion region, greatly reduces mark The burden of data.
5. a kind of Image emotional semantic classification method using visual attention contract network according to claim 1, feature It is: global information is carried out to be added the final feature of acquisition with local message, while protrusion important information, avoids letter The loss of breath.
6. a kind of Image emotional semantic classification method using visual attention contract network according to claim 1, feature It is: in b step, for every a kind of affective tag, has specific detector that the characteristic response figure generated to a is gone to examine It surveys, associated region just will appear high response, then carry out the testing result of all categories to assign weight addition, model ginseng Number can be updated according to the loss function of this branch, until convergence.
CN201910311521.1A 2019-04-18 2019-04-18 A kind of Image emotional semantic classification method using visual attention contract network Pending CN110119688A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910311521.1A CN110119688A (en) 2019-04-18 2019-04-18 A kind of Image emotional semantic classification method using visual attention contract network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910311521.1A CN110119688A (en) 2019-04-18 2019-04-18 A kind of Image emotional semantic classification method using visual attention contract network

Publications (1)

Publication Number Publication Date
CN110119688A true CN110119688A (en) 2019-08-13

Family

ID=67521142

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910311521.1A Pending CN110119688A (en) 2019-04-18 2019-04-18 A kind of Image emotional semantic classification method using visual attention contract network

Country Status (1)

Country Link
CN (1) CN110119688A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110705490A (en) * 2019-10-09 2020-01-17 中国科学技术大学 Visual emotion recognition method
CN110796150A (en) * 2019-10-29 2020-02-14 中山大学 Image emotion recognition method based on emotion significant region detection
CN110827312A (en) * 2019-11-12 2020-02-21 北京深境智能科技有限公司 Learning method based on cooperative visual attention neural network
CN111026898A (en) * 2019-12-10 2020-04-17 云南大学 Weak supervision image emotion classification and positioning method based on cross space pooling strategy
CN111311364A (en) * 2020-02-13 2020-06-19 山东大学 Commodity recommendation method and system based on multi-mode commodity comment analysis
CN111832573A (en) * 2020-06-12 2020-10-27 桂林电子科技大学 Image emotion classification method based on class activation mapping and visual saliency
CN112836718A (en) * 2020-12-08 2021-05-25 上海大学 Fuzzy knowledge neural network-based image emotion recognition method
CN114626454A (en) * 2022-03-10 2022-06-14 华南理工大学 Visual emotion recognition method integrating self-supervision learning and attention mechanism
CN116719930A (en) * 2023-04-28 2023-09-08 西安工程大学 Multi-mode emotion analysis method based on visual attention

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107341506A (en) * 2017-06-12 2017-11-10 华南理工大学 A kind of Image emotional semantic classification method based on the expression of many-sided deep learning
US20170344880A1 (en) * 2016-05-24 2017-11-30 Cavium, Inc. Systems and methods for vectorized fft for multi-dimensional convolution operations
CN108427740A (en) * 2018-03-02 2018-08-21 南开大学 A kind of Image emotional semantic classification and searching algorithm based on depth measure study
US10140515B1 (en) * 2016-06-24 2018-11-27 A9.Com, Inc. Image recognition and classification techniques for selecting image and audio data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170344880A1 (en) * 2016-05-24 2017-11-30 Cavium, Inc. Systems and methods for vectorized fft for multi-dimensional convolution operations
US10140515B1 (en) * 2016-06-24 2018-11-27 A9.Com, Inc. Image recognition and classification techniques for selecting image and audio data
CN107341506A (en) * 2017-06-12 2017-11-10 华南理工大学 A kind of Image emotional semantic classification method based on the expression of many-sided deep learning
CN108427740A (en) * 2018-03-02 2018-08-21 南开大学 A kind of Image emotional semantic classification and searching algorithm based on depth measure study

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JUFENG YANG: "Weakly Supervised Coupled Networks for Visual Sentiment Analysis", 《2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110705490B (en) * 2019-10-09 2022-09-02 中国科学技术大学 Visual emotion recognition method
CN110705490A (en) * 2019-10-09 2020-01-17 中国科学技术大学 Visual emotion recognition method
CN110796150A (en) * 2019-10-29 2020-02-14 中山大学 Image emotion recognition method based on emotion significant region detection
CN110796150B (en) * 2019-10-29 2022-09-16 中山大学 Image emotion recognition method based on emotion significant region detection
CN110827312A (en) * 2019-11-12 2020-02-21 北京深境智能科技有限公司 Learning method based on cooperative visual attention neural network
CN110827312B (en) * 2019-11-12 2023-04-28 北京深境智能科技有限公司 Learning method based on cooperative visual attention neural network
CN111026898A (en) * 2019-12-10 2020-04-17 云南大学 Weak supervision image emotion classification and positioning method based on cross space pooling strategy
CN111311364A (en) * 2020-02-13 2020-06-19 山东大学 Commodity recommendation method and system based on multi-mode commodity comment analysis
CN111832573B (en) * 2020-06-12 2022-04-15 桂林电子科技大学 Image emotion classification method based on class activation mapping and visual saliency
CN111832573A (en) * 2020-06-12 2020-10-27 桂林电子科技大学 Image emotion classification method based on class activation mapping and visual saliency
CN112836718A (en) * 2020-12-08 2021-05-25 上海大学 Fuzzy knowledge neural network-based image emotion recognition method
CN114626454A (en) * 2022-03-10 2022-06-14 华南理工大学 Visual emotion recognition method integrating self-supervision learning and attention mechanism
CN116719930A (en) * 2023-04-28 2023-09-08 西安工程大学 Multi-mode emotion analysis method based on visual attention

Similar Documents

Publication Publication Date Title
CN110119688A (en) A kind of Image emotional semantic classification method using visual attention contract network
Xiao et al. A framework for quantitative analysis and differentiated marketing of tourism destination image based on visual content of photos
Shetty et al. Not Using the Car to See the Sidewalk--Quantifying and Controlling the Effects of Context in Classification and Segmentation
CN107134144B (en) A kind of vehicle checking method for traffic monitoring
CN107133955B (en) A kind of collaboration conspicuousness detection method combined at many levels
CN108537269B (en) Weak interactive object detection deep learning method and system thereof
CN109255334A (en) Remote sensing image terrain classification method based on deep learning semantic segmentation network
CN106408030B (en) SAR image classification method based on middle layer semantic attribute and convolutional neural networks
CN109002834A (en) Fine granularity image classification method based on multi-modal characterization
CN109740686A (en) A kind of deep learning image multiple labeling classification method based on pool area and Fusion Features
CN106537390B (en) Identify the presentation style of education video
CN108427740B (en) Image emotion classification and retrieval algorithm based on depth metric learning
CN111985532B (en) Scene-level context-aware emotion recognition deep network method
CN113159826B (en) Garment fashion element prediction system and method based on deep learning
CN106960176A (en) A kind of pedestrian's gender identification method based on transfinite learning machine and color characteristic fusion
CN111832573A (en) Image emotion classification method based on class activation mapping and visual saliency
CN102945372B (en) Classifying method based on multi-label constraint support vector machine
Lin et al. Two stream active query suggestion for active learning in connectomics
Fu et al. Personality trait detection based on ASM localization and deep learning
CN102542590B (en) High-resolution SAR (Synthetic Aperture Radar) image marking method based on supervised topic model
Kobayashi et al. Aesthetic design based on the analysis of questionnaire results using deep learning techniques
CN113705301A (en) Image processing method and device
Cao et al. SynArtifact: Classifying and Alleviating Artifacts in Synthetic Images via Vision-Language Model
Wang et al. Human interaction understanding with joint graph decomposition and node labeling
CN115115979A (en) Identification and replacement method of component elements in video and video recommendation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20190813