CN106886580B - Image emotion polarity analysis method based on deep learning - Google Patents

Image emotion polarity analysis method based on deep learning Download PDF

Info

Publication number
CN106886580B
CN106886580B CN201710059051.5A CN201710059051A CN106886580B CN 106886580 B CN106886580 B CN 106886580B CN 201710059051 A CN201710059051 A CN 201710059051A CN 106886580 B CN106886580 B CN 106886580B
Authority
CN
China
Prior art keywords
emotion
picture
words
search
polarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710059051.5A
Other languages
Chinese (zh)
Other versions
CN106886580A (en
Inventor
毋立芳
刘爽
祁铭超
张磊
简萌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN201710059051.5A priority Critical patent/CN106886580B/en
Publication of CN106886580A publication Critical patent/CN106886580A/en
Application granted granted Critical
Publication of CN106886580B publication Critical patent/CN106886580B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A picture emotion polarity analysis method based on deep learning relates to the technical field of image content understanding and big data analysis. The traditional picture emotion analysis method has poor final prediction accuracy due to the fact that models and features are simple. At present, a deep learning method is used for training in a large-scale training set, but the noise of the training set is too large, so that the final performance is limited. The invention adopts a mode of directly acquiring data from the network, and the data scale is large. Only emotional polarity information of a common word that needs to be obtained at the time of data preparation may need to be manually labeled. Then, the whole image acquisition and cleaning work can be automatically completed, and the required labor cost is low. In the data acquisition stage, two data cleaning processes are introduced, so that the noise of a large part of pictures inconsistent with the label can be eliminated. The method uses the priori knowledge in the training set to filter the training set, so that the noise of the training set is reduced, and the picture emotion prediction accuracy is improved by the aid of an improved network structure.

Description

Image emotion polarity analysis method based on deep learning
Technical Field
The invention relates to the technical field of image content understanding and big data analysis, in particular to a picture emotion analysis method.
Background
With the development of the internet and the popularization of smart phones, social networks have an irreplaceable position in people's daily life. More and more people are beginning to express their own opinions through social networking platforms, and a large amount of user generated data is generated accordingly.
User Generated Content (UGC) refers to the original Content uploaded by a User, which originates from the User and ultimately serves the User. In the web2.0 era, users are not passively accepting internet contents, but participate in the internet contents as subjects, and besides the roles of users, the internet contents also become producers and propagators.
In the face of huge user generated data, how to effectively utilize the data becomes a problem which needs to be solved urgently at present. With respect to these data, research related to opinion mining and sentiment analysis began to be a research hotspot. They analyze UGC data for public opinion analysis, for responses of the public to an event, for predicting a box office, for predicting stock trends, etc.
But these studies and methods are currently based on textual information. In the social network, the user data is diversified and includes not only text, but also pictures, videos and the like.
For characters, people with different backgrounds in different regions may have different understandings, but for pictures, the reactions of people are often consistent. And now devices for graphics computing are becoming cheaper and more powerful, which makes it possible to do large-scale graphics computing.
At present, for the emotion analysis problem of pictures, a supervised learning method is generally adopted. Firstly, collecting a picture set with labels, then training a model by using a machine learning method, and finally carrying out emotion analysis on a new picture by using the trained model.
Early methods utilized manually collected sets of pictures and classified using simple classifiers, such as: the article "sentributibute: image sensory analysis from a mid-levelperspective" published by JianboYuan in 2013 uses a manual annotation data set of SUN, which includes 14340 manual annotation images, and the images are subjected to emotion analysis by using an SVM as a learning tool and being assisted by facial expression recognition.
With the complexity of machine learning models, small-scale data sets have not been able to meet training requirements. It is common in recent work to acquire data sets in a manner that the network collects the data sets. For example: an article "Analyzing and Predicting sentiment of Images on the Social Web" published by Stefan Siersdorfer in 2010 uses the words of the top 1000 positive and negative sentiment intensities in the sentiment dictionary of SentiWordNet as search words to search in Flickr to obtain 586000 Images for training of sentiment analysis models; in an article, "Large-scale Visual sensory Ontology and Detectors Using objective Noun Pairs", published by Damian Borth in 2013, 1200 Adjective name Pairs are used for searching and sorting in Flickr to form a Large-scale emotion analysis data set Sentibank. Sentibank is a widely used emotion analysis data set at present, but because pictures in the sentiment analysis data set are directly acquired from a network and then stored, the noise is large, and the subsequent emotion analysis precision is severely restricted.
Some of the latest methods are methods using deep learning. For example: the method is characterized in that a PCNN is constructed by Using a Sentibank data set and a self-learning thought to improve a Deep learning network in an article 'road Image generative Analysis Using progressive Trained and domain transferred Deep Networks' published by Quanzing Young 2015, and the noise problem in the network data set can be resisted to a certain extent.
In summary, the traditional picture emotion analysis method needs a small data set, but the final prediction accuracy is not ideal due to the simplicity of models and features. Some current methods using deep learning train in a large-scale training set, but the final performance is limited due to the excessive noise of the training set. The invention provides a picture emotion polarity analysis method based on deep learning, which is characterized in that priori knowledge is used for a training set to filter the training set, so that the noise of the training set is reduced, and the picture emotion prediction accuracy is improved by means of an improved network structure.
Disclosure of Invention
The invention aims to provide a picture emotion polarity analysis method based on deep learning, and the frame of the method is shown in figure 1.
The method comprises three stages, namely data acquisition, deep learning model training and picture emotion polarity analysis.
According to the method, firstly, some emotion vocabularies are used as search terms to obtain related pictures from a picture website, and then the emotion polarities corresponding to the emotion vocabularies are used as picture labels to obtain an initial data set. And then filtering the data set by utilizing the emotion polarity of the search word, the picture label and the emotion consistency of the picture description characters to obtain a purer data set. And then, training the CNN model by using the obtained data set by using a deep learning method to obtain an emotion polarity classification model. And finally, carrying out emotion polarity analysis on the picture by using the CNN model trained in the previous step.
The picture emotion analysis method specifically comprises the following steps:
1. data acquisition
The method can be applied to most of picture social network sites with picture searching functions. Since the website has the maximum retrieval number limit, and in order to ensure the richness and the balance of data, a large number of search terms are used for retrieval to obtain pictures in the method.
1.1. A priori knowledge preparation
In order to ensure the emotion polarity accuracy of the search word, an emotion dictionary of word emotion polarity is prepared before data acquisition. In the method, the main emotion polarity of the commonly used vocabulary can be provided by using the main emotion polarity emotion dictionary of the emotion vocabulary. The dominant emotional polarity of a word is the emotional polarity that the word expresses in its usual context. The emotion dictionary needs to be constructed by manual labeling or by using an existing public dictionary, and words in the emotion dictionary are constructed in a (word, emotion intensity) manner, wherein the value range of the emotion intensity is [ -1,1], the more the emotion intensity is close to 1, the more positive the emotion polarity of the word is represented, and the more negative the emotion polarity is represented if the emotion polarity is close to-1, specific examples are as follows:
remorse-0.9
Violent anger-0.9
From 0.7
0.7 pieces of thousands of knives
Lezi 0.5
0.5 land for five bodies
1.2. Search term selection
In order to acquire data from the network, firstly, search terms need to be prepared, and in the method, a strategy for collecting the search terms from the network is selected. The method comprises the following specific steps:
1.2.1 using words (such as happy and sad) containing definite emotion polarity as initial search words to search in the picture website, collecting search results and extracting description words in the search results, wherein the description words refer to description information related to pictures, such as labels, introduction and context text information of the pictures.
1.2.2 using word segmentation tool to perform word segmentation processing on the description words and remove stop words, performing part-of-speech analysis on the independent words in the description words, and extracting nouns and adjectives in the description words. And the nouns and adjectives are paired one by one (taking the Cartesian product). The paired results are stored in the form of adjectives and nouns as an initial search word bank.
1.2.3, performing data cleaning on the initial search word library obtained in 1.2.2, wherein the aim of the cleaning is to remove the part of the search word library where the emotional polarities of the adjectives and the nouns conflict. Using the emotion dictionary obtained in 1.1, analyzing the polarity relationship of the adjective nouns in each search thesaurus and removing the conflict, and formalizing the rule for any (adjective, noun) pair in the search thesaurus as follows:
f1(A,N)=Sen(A)+Sen(N) (1)
wherein A represents an adjective in the word pair and N represents a noun in the word pair. The Sen (x) function represents the emotion polarity for word x obtained from the emotion dictionary (obtained in 1.1), i.e., if the emotion intensity is (0, 1)]The Sen () function returns 1, returns-1 if the emotion intensity is [ -1,0), and returns 0 if the word x is not present in the emotion dictionary, considering x not to contain emotion. If f is1A value of 0 indicates that there is a conflict between the adjective nouns or that there is no emotion, and should be removed. If f is1A non-0 indicates that no conflict exists and should be preserved.
And 1.2.4, performing emotion marking on the screened search word bank by using the emotion dictionary obtained in the step 1.1 and generating a final search word bank. The emotion label of each (adjective, noun) pair in the search word library is obtained by adding the emotion intensity of the adjective and the noun. Specific examples are:
adjectives: perfect of-0.9
The noun: groaning-0.5
Emotion label: -0.9+ (-0.5) ═ 1.4
1.3. Search using search term
And (3) carrying out image retrieval by using the retrieval word bank obtained by the step 1.2.4, and specifically comprising the following steps:
(1) and taking out a pair of emotional words from the search word bank.
(2) And searching in the website to obtain a search result.
(3) And extracting the picture and corresponding description words from the retrieval result, wherein the description words refer to description information related to the picture, and can be labels, introduction and context text information of the picture.
(4) And performing word segmentation processing on the description characters by using a word segmentation tool, removing stop words, and taking independent words in the description characters as description information.
(5) And taking the emotion marking information corresponding to the emotion words for the retrieval as a label of the extracted picture.
(6) The (pictures, description information, tags) are stored in the database as triplets.
(7) And (4) repeating the steps (1) to (6) until all the words in the search word bank are used.
So far, we obtain an emotion picture database.
1.4. Data set cleansing
Because the noise of internet data is very big, the data cleaning work is very important. In the method, the emotional polarity of descriptive information words of the picture and the consistency of the label of the picture are utilized to remove a possibly existing noise image. The method comprises the following specific steps:
(1) and taking out a three-tuple element from the emotion picture database obtained in the step 1.3.
(2) And (4) judging the polarity of the description information words one by using the emotion dictionary obtained in the step 1.1.
(3) And (3) carrying out consistency analysis on the polarity obtained in the step (2) and the polarity of the tag item of the triple, if the polarities of the two are in conflict, considering the triple element as a noise element, and deleting the triple element from the database, wherein the formalization of the rule is represented as follows for any pair (picture, description information and tag) in the emotion picture database:
f2(Label,Tag)=∑(not(sgn(Label)+Sen(Tagi))) (1)
Figure GDA0001244919800000061
wherein Label represents a Label in the triple, Tag represents description information in the triple, and Tag represents description information in the tripleiRepresents the ith independent word in the description information. sgn (x) is a sign function, and the function expression is shown as (3). The not (x) function is a logical negation function, if x is equal to 0, then not (x) is 1, otherwise if x is not equal to 0, then not (x) is 0. If f is2If the result of (2) is greater than 0, it indicates that there is a conflict between the tag of the picture and the corresponding description information, and the triplet should be deleted from the database. If f is negative or positive2A result of 0 indicates that no conflict exists and should be preserved.
(4) Repeating (1) - (3) until all pictures in the database have been analyzed.
2. Deep learning model training
After the data set is obtained, model training may be performed. In the method, an improved CNN model is adopted, and an auxiliary loss layer is added on the basis of the traditional CNN model, so that the CNN model has better performance.
2.1. Designing deep Convolutional Neural Network (CNN)
Fig. 2 shows the CNN model framework used in the present method. The network consists of 5 convolutional layers, 3 fully-connected layers and 1 softmax layer, wherein the activation function of the neuron selects a ReLU function, the parameters of the first two convolutional layers and the second 5 convolutional layers are added with a poling layer, the convolutional layers, the poling layer and the first two layers of the fully-connected layers are completely consistent with the configuration of AlexNet, and the modification size of the last fully-connected layer is 2 and is named as fc8_ s. The output of the softmax layer is the emotional polarity (positive, negative) of the picture. An Euclidean loss layer and a corresponding full-connected layer fc8_ e are additionally added during model training, and the output is the emotional intensity of a picture and is used for measuring the prediction error at the real number level. The image needs to be normalized to 256 x 256 RGB images when it is input into the CNN network.
2.2. Training CNN model
And (3) training the CNN model by using the data set obtained in the step (1), and firstly, restoring each element of the data set into a triple form (picture, real-value label and binary label). The way of binarization is (0, 1) quantized to 1 and [ -1,0] quantized to 0. the picture in the triplet is used as input, the label is used as a supervision signal at real number level and used as a measure of Euclidean Loss, and the label after binarization is used as a supervision signal of emotion polarity and used as a measure of Softmax Loss.
3. Sentiment analysis of pictures
Using the trained CNN model in 2 as an emotion classifier, normalizing the picture to 256 × 256 size, and inputting the normalized picture into the model to generate emotion polarity prediction output.
Compared with the prior art, the method has the following advantages:
1. the available data size is large
The present invention uses a direct data acquisition from the network that is much larger in data size than the original manually collected data set.
2. The labor cost is low
In the invention, only the emotional polarity information of a common word which needs to be obtained in data preparation may need to be manually marked. Then, the whole image acquisition and cleaning work can be automatically completed, and the required labor cost is low.
3. Low data noise
In the data acquisition stage, two data cleaning processes are introduced, so that the noise of a large part of pictures inconsistent with the label can be eliminated, and compared with the traditional method for directly acquiring the network data set, the method has lower data noise.
4. The prediction precision is high
When the same data set is used for training, the accuracy of the model provided by the invention can be improved compared with that of the traditional CNN model.
Drawings
FIG. 1 is a diagram of a picture emotion analysis framework designed by the present invention;
FIG. 2 is a CNN model framework used in the present invention;
FIG. 3 is an example of emotion words used in the practice of the present invention;
FIG. 4 is an example of a removed term after practicing the present invention;
FIG. 5 is an example of a graph of the results of ablation after the practice of the present invention;
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the accompanying drawings. The invention aims to provide a picture emotion polarity analysis method based on deep learning, and the frame of the method is shown in figure 1. The invention is described in further detail below with reference to the figures and examples.
The invention has the following implementation steps:
1. data acquisition
The method can be applied to most of picture social network sites with picture searching functions. In the implementation, we select Flickr, a photo social networking site to collect data. Flickr currently allows the peak picture number to be returned to 2000 for a retrieval request.
1.1. A priori knowledge preparation
In the implementation process, an English emotion dictionary is selected, and for convenience, an existing vocabulary dictionary is used, wherein the vocabulary dictionary is provided by an article "Sentiment analysis on twitterthrough topic-based lexicon expansion" of Zhou Zhixin in 2014, 21000 independent vocabularies are included, each vocabulary has a corresponding main emotion mark, the value range of the intensity is [ -1,1], the more 1 the emotion intensity is, the more positive the emotion polarity of the word is, and otherwise, the more negative the emotion polarity is, the more negative the emotion polarity. Fig. 3 lists some words randomly selected from the dictionary and the corresponding annotation information.
1.2. Search term selection
In order to acquire data from the network, search terms need to be prepared first, and here we select a strategy for collecting the search terms from the network. The method comprises the following specific steps:
1.2.1 we use the words containing explicit emotion polarity as the initial search words to search in the Flickr website, collect the search results and extract the descriptive text (description and tag information) of each picture, in the specific implementation we choose anger, dispatch, fear, sadness, happy, exception, awe, amusement as the initial search words.
1.2.2 using word segmentation tool to perform word segmentation processing on the description words and remove stop words, performing part-of-speech analysis on the independent words in the description words, and extracting nouns and adjectives in the description words. And the nouns and adjectives are paired one by one (taking the Cartesian product). The paired results are stored in the form of adjectives and nouns as an initial search word bank.
1.2.3, performing data cleaning on the initial search word library obtained in 1.2.2, wherein the aim of the cleaning is to remove the part of the search word library where the emotional polarities of the adjectives and the nouns conflict. Using the emotion dictionary obtained in 1.1, analyzing the polarity relationship of the adjective nouns in each search thesaurus and removing the conflict, and formalizing the rule for any (adjective, noun) pair in the search thesaurus as follows:
f1(A,N)=Sen(A)+Sen(N) (1)
wherein A represents an adjective in the word pair and N represents a noun in the word pair. The Sen (x) function represents the emotion polarity for obtaining word x from the emotion dictionary (found in 1.1), i.e., if the emotion intensity is (0, 1) then the Sen () function returns 1, if the emotion intensity is [ -1,0) then the Sen () function returns-1, if there is no word x in the emotion dictionary then x is considered not to contain an emotion and the function returns 0. If 0, it indicates that there is a conflict between the adjective nouns or contains no emotion, and should be removed. If not 0, this indicates that no conflict exists and should be preserved.
And 1.2.4, performing emotion marking on the screened search word bank by using the emotion dictionary obtained in the step 1.1 and generating a final search word bank. The emotion label of each (adjective, noun) pair in the search word library is obtained by adding the emotion intensity of the adjective and the noun.
1.3. Search using search term
And (3) carrying out image retrieval by using the retrieval word bank obtained by the step 1.2.4, and specifically comprising the following steps:
(1) and taking out a pair of emotional words from the search word bank.
(2) And searching in the website to obtain a search result.
(3) And extracting the picture and corresponding description words from the retrieval result, wherein the description words refer to description information related to the picture, and can be labels, introduction and context text information of the picture.
(4) And performing word segmentation processing on the description characters by using a word segmentation tool, removing stop words, and taking independent words in the description characters as description information.
(5) And taking the emotion marking information corresponding to the emotion words for the retrieval as a label of the extracted picture.
(6) The (pictures, description information, tags) are stored in the database as triplets.
(7) And (4) repeating the steps (1) to (6) until all the words in the search word bank are used.
So far, we obtain an emotion picture database.
1.4. Data set cleansing
Because the noise of internet data is very big, the data cleaning work is very important. In the method, the emotional polarity of descriptive information words of the picture and the consistency of the label of the picture are utilized to remove a possibly existing noise image. The method comprises the following specific steps:
(1) taking out a three-tuple element from the emotion picture database obtained from 1.3
(2) And (4) judging the polarity of the description information words one by using the emotion dictionary obtained in the step 1.1.
(3) And (3) carrying out consistency analysis on the polarity obtained in the step (2) and the polarity of the tag item of the triple, if the polarities of the two are in conflict, considering the triple element as a noise element, and deleting the triple element from the database, wherein the formalization of the rule is represented as follows for any pair (picture, description information and tag) in the emotion picture database:
f2(Label,Tag)=∑(not(sgn(Label)+Sen(Tagi))) (1)
wherein Label represents a Label in the triple, Tag represents description information in the triple, and Tag represents description information in the tripleiRepresents the ith independent word in the description information. sgn (x) is a sign function, and the function expression is shown as (3). The not (x) function is a logical negation function, if x is equal to 0, then not (x) is 1, otherwise if x is not equal to 0, then not (x) is 0. If f is2If the result of (2) is greater than 0, it indicates that there is a conflict between the tag of the picture and the corresponding description information, and the triplet should be deleted from the database. If f is negative or positive2A result of 0 indicates that no conflict exists and should be preserved.
(4) Repeating (1) - (3) until all pictures in the database have been analyzed.
Figure 5 lists some of the noise images removed using this rule.
2. Deep learning model training phase
After the data set is obtained, model training may be performed. In the method, an improved CNN model is adopted, and an auxiliary loss layer is added on the basis of the traditional CNN model, so that the CNN model has better performance.
2.1. Designing deep Convolutional Neural Network (CNN)
Fig. 2 shows the CNN model framework used in the present method. The network consists of 5 convolutional layers, 3 fully-connected layers and 1 softmax layer, wherein the activation function of the neuron selects a ReLU function, the parameters of the first two convolutional layers and the second 5 convolutional layers are added with a poling layer, the convolutional layers, the poling layer and the first two layers of the fully-connected layers are completely consistent with the configuration of AlexNet, and the modification size of the last fully-connected layer is 2 and is named as fc8_ s. The output of the softmax layer is the emotional polarity (positive, negative) of the picture. An Euclidean loss layer and a corresponding full-connected layer (named fc8_ e) are additionally added during model training, and the output is the emotional intensity of a picture and is used for measuring the prediction error at the real number level. The image needs to be normalized to 256 x 256 RGB images when it is input into the CNN network.
2.2. Training CNN model
And (3) training the CNN model by using the data set obtained in the step (1), and firstly, restoring each element of the data set into a triple form (picture, real-value label and binary label). The way of binarization is (0, 1) quantized to 1 and [ -1,0] quantized to 0. the picture in the triplet is used as input, the label is used as a supervision signal at real number level and used as a measure of Euclidean Loss, and the label after binarization is used as a supervision signal of emotion polarity and used as a measure of Softmax Loss.
The deep learning model training can be carried out under a Caffe framework, and as the CNN network used by the method is completely consistent with the parameters of the first 7 layers of AlexNet, the model trained on ImageNet by AlexNet can be used for fine adjustment in the obtained data set during training. The learning ratio of the first 7 layers is set to 1, and the learning ratios of the two fully connected layers fc8_ s, fc8_ e are set to 10. The basic learning rate and the iteration number can be determined according to the data scale and the learning condition of the model.
3. Sentiment analysis of pictures
Using the trained CNN model in 2 as an emotion classifier, normalizing the picture to 256 × 256 size, and inputting the normalized picture into the model to generate emotion polarity prediction output.
4. Model evaluation
The data cleaning method provided by the invention is used for cleaning data in a sentiBank Image library, the cleaned data set is used for training a Deep learning model provided by the invention, and then a test is carried out on a Twitter picture emotion data set (published in 2015 by Quanzing Young's article, "Robust Image sentational Analysis Using progressive transmitted and Domain Transferred Deep Networks", here, a 5-aggregate subset is used), so that the prediction accuracy can reach 81.95%, and is improved by more than 4% compared with the traditional Deep learning method.

Claims (1)

1. A picture emotion polarity analysis method based on deep learning is divided into three stages, namely a data acquisition stage, a deep learning model training stage and a picture emotion polarity analysis stage; the method is characterized by comprising the following specific steps:
the data acquisition comprises the following specific steps:
1.1. a priori knowledge preparation
An emotion dictionary of main emotion polarities of emotion vocabularies needs to be prepared, the emotion dictionary needs to be constructed in a manual labeling mode or an existing public dictionary is used, and words in the emotion dictionary are constructed in a (word and emotion intensity) mode;
1.2. search term selection
Selecting a strategy for collecting search terms from a network; the method comprises the following specific steps:
1.2.1 using words containing definite emotion polarity as initial search words to search in a picture website, collecting search results and extracting description words in the search results, wherein the description words refer to description information related to pictures and comprise labels, introduction and context text information of the pictures;
1.2.2 utilizing a word segmentation tool to perform word segmentation processing on the description characters, removing stop words, performing part-of-speech analysis on independent words in the description characters, and extracting nouns and adjectives in the description characters; and the nouns and the adjectives are paired one by one; storing the paired results as an initial search word library in an adjective and noun manner;
1.2.3, carrying out data cleaning on the initial search word bank obtained in the step 1.2.2 for one time; using the emotion dictionary obtained in 1.1, analyzing the polarity relationship of the adjective nouns in each search thesaurus and removing the conflict, and formalizing the rule for any (adjective, noun) pair in the search thesaurus as follows:
f1(A,N)=Sen(A)+Sen(N) (1)
wherein A represents an adjective in the word pair, and N represents a noun in the word pair;the Sen (x) function represents the emotion polarity of word x obtained from the emotion dictionary obtained in 1.1, i.e., if the emotion intensity is (0, 1)]Then the Sen () function returns 1, if the emotion intensity is [ -1,0) then the Sen () function returns-1, if there is no word x in the emotion dictionary then x is considered not to contain emotion, the function returns 0; if f is1If the value is 0, the conflict exists between the adjective nouns or the emotions are not contained, and the emotions are removed; if f is1If not, it indicates that there is no conflict and should be reserved;
1.2.4 performing emotion marking on the screened search word bank by using the emotion dictionary obtained in the step 1.1 and generating a final search word bank; the emotion label of each (adjective, noun) pair in the search word library is obtained by adding the emotion intensity of the adjective and the noun;
1.3. search using search term
And (3) carrying out image retrieval by using the retrieval word bank obtained by the step 1.2.4, and specifically comprising the following steps:
(1) taking out a pair of emotional words from the retrieval word bank;
(2) searching in a website to obtain a search result;
(3) extracting pictures and corresponding description words from the retrieval result, wherein the description words refer to description information related to the pictures and comprise labels, introduction and context text information of the pictures;
(4) utilizing a word segmentation tool to perform word segmentation processing on the description characters and remove stop words, and taking independent words in the description characters as description information;
(5) taking emotion marking information corresponding to the emotion words for the retrieval as a label of the extracted picture;
(6) storing (pictures, description information, tags) as triples in a database;
(7) repeating the steps (1) to (6) until all the words in the search word bank are used;
thus, an emotion picture database is obtained;
1.4. data set cleansing
The method comprises the following specific steps:
1) taking out a three-tuple element from the emotion picture database obtained in the step 1.3;
2) judging the polarity of the description information words one by using the emotion dictionary obtained in the step 1.1;
3) and (3) carrying out consistency analysis on the polarity obtained in the step (2) and the polarity of the tag item of the triple, if the polarities of the two are in conflict, considering the triple element as a noise element, and deleting the triple element from the database, wherein the formalization of the rule is represented as follows for any pair (picture, description information and tag) in the emotion picture database:
f2(Label,Tag)=∑(not(sgn(Label)+Sen(Tagi))) (2)
Figure FDA0002176890070000021
wherein Label represents a Label in the triple, Tag represents description information in the triple, and Tag represents description information in the tripleiRepresenting the ith independent word in the description information; sgn (x) is a sign function, and the function expression is shown in formula (3); the not (x) function is a logical negation function, if x is equal to 0, then not (x) is 1, otherwise, if x is not equal to 0, then not (x) is 0; if f is2If the result is greater than 0, it indicates that there is a conflict between the tag of the picture and the corresponding description information, and the triple should be deleted from the database; if f is negative or positive2If the result is 0, the conflict does not exist and the conflict should be reserved;
4) repeating 1) -3) until all pictures in the database have been analyzed;
the deep learning model training comprises the following specific steps:
2.1 designing deep convolutional neural networks
The network consists of 5 convolutional layers, 3 fully-connected layers and 1 softmax layer, wherein the activation function of a neuron selects a ReLU function, the parameters of the first two convolutional layers and the second 5 convolutional layers are completely consistent with the configuration of AlexNet, and the modified size of the last fully-connected layer is 2 and is named as fc8_ s; the output of the softmax layer is the emotion polarity of the picture; an Euclidean loss layer and a corresponding full-connection layer are additionally added during model training, the emotion intensity of the picture is output, and the emotion intensity is used for measuring the prediction error of the real number level; when the image is input into a CNN network, the image needs to be normalized into 256 × 256 RGB images;
2.2 training the CNN model
Training a CNN model by using a data set obtained by the data acquisition, and firstly, storing each element of the data set again in a triple form (picture, real-value label and binary label); the binarization mode is that (0, 1) is quantized to be 1 and [ -1,0] is quantized to be 0, the picture in the triple is used as input, the label is used as a supervision signal of a real number level and used as the measurement of Euclidean Loss, and the label after binarization is used as a supervision signal of emotion polarity and used as the measurement of Softmax Loss;
performing emotion analysis on the picture;
using the trained CNN model as an emotion classifier, the picture is first normalized to 256 × 256 dimensions and then input into the model to generate an emotion polarity prediction output.
CN201710059051.5A 2017-01-23 2017-01-23 Image emotion polarity analysis method based on deep learning Active CN106886580B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710059051.5A CN106886580B (en) 2017-01-23 2017-01-23 Image emotion polarity analysis method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710059051.5A CN106886580B (en) 2017-01-23 2017-01-23 Image emotion polarity analysis method based on deep learning

Publications (2)

Publication Number Publication Date
CN106886580A CN106886580A (en) 2017-06-23
CN106886580B true CN106886580B (en) 2020-01-17

Family

ID=59175439

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710059051.5A Active CN106886580B (en) 2017-01-23 2017-01-23 Image emotion polarity analysis method based on deep learning

Country Status (1)

Country Link
CN (1) CN106886580B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107330463B (en) * 2017-06-29 2020-12-08 南京信息工程大学 Vehicle type identification method based on CNN multi-feature union and multi-kernel sparse representation
CN107357889B (en) * 2017-07-11 2020-07-17 北京工业大学 Cross-social platform picture recommendation algorithm based on content or emotion similarity
CN107491433A (en) * 2017-07-24 2017-12-19 成都知数科技有限公司 Electric business exception financial products recognition methods based on deep learning
CN107679580B (en) * 2017-10-21 2020-12-01 桂林电子科技大学 Heterogeneous migration image emotion polarity analysis method based on multi-mode depth potential correlation
CN107908720A (en) * 2017-11-14 2018-04-13 河北工程大学 A kind of patent data cleaning method and system based on AdaBoost algorithms
CN108170811B (en) * 2017-12-29 2022-07-15 北京大生在线科技有限公司 Deep learning sample labeling method based on online education big data
CN108875821A (en) 2018-06-08 2018-11-23 Oppo广东移动通信有限公司 The training method and device of disaggregated model, mobile terminal, readable storage medium storing program for executing
US11210335B2 (en) * 2018-09-27 2021-12-28 Optim Corporation System and method for judging situation of object
CN110083726B (en) * 2019-03-11 2021-10-22 北京比速信息科技有限公司 Destination image perception method based on UGC picture data
CN110046253B (en) * 2019-04-10 2022-01-04 广州大学 Language conflict prediction method
CN111259141A (en) * 2020-01-13 2020-06-09 北京工业大学 Social media corpus emotion analysis method based on multi-model fusion
CN111832573B (en) * 2020-06-12 2022-04-15 桂林电子科技大学 Image emotion classification method based on class activation mapping and visual saliency
CN112307757B (en) * 2020-10-28 2023-07-28 中国平安人寿保险股份有限公司 Emotion analysis method, device, equipment and storage medium based on auxiliary task

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101894102A (en) * 2010-07-16 2010-11-24 浙江工商大学 Method and device for analyzing emotion tendentiousness of subjective text
CN102200969A (en) * 2010-03-25 2011-09-28 日电(中国)有限公司 Text sentiment polarity classification system and method based on sentence sequence
CN102663139A (en) * 2012-05-07 2012-09-12 苏州大学 Method and system for constructing emotional dictionary
CN104200804A (en) * 2014-09-19 2014-12-10 合肥工业大学 Various-information coupling emotion recognition method for human-computer interaction
CN105893582A (en) * 2016-04-01 2016-08-24 深圳市未来媒体技术研究院 Social network user emotion distinguishing method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120253792A1 (en) * 2011-03-30 2012-10-04 Nec Laboratories America, Inc. Sentiment Classification Based on Supervised Latent N-Gram Analysis

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102200969A (en) * 2010-03-25 2011-09-28 日电(中国)有限公司 Text sentiment polarity classification system and method based on sentence sequence
CN101894102A (en) * 2010-07-16 2010-11-24 浙江工商大学 Method and device for analyzing emotion tendentiousness of subjective text
CN102663139A (en) * 2012-05-07 2012-09-12 苏州大学 Method and system for constructing emotional dictionary
CN104200804A (en) * 2014-09-19 2014-12-10 合肥工业大学 Various-information coupling emotion recognition method for human-computer interaction
CN105893582A (en) * 2016-04-01 2016-08-24 深圳市未来媒体技术研究院 Social network user emotion distinguishing method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Image Retargeting by combining fast Seam Carving with Neighboring Probability (FSc_ Neip) and Scaling;Lifang Wu et al.;《IEEE International Conference on Multimedia and Expo》;20151231;全文 *

Also Published As

Publication number Publication date
CN106886580A (en) 2017-06-23

Similar Documents

Publication Publication Date Title
CN106886580B (en) Image emotion polarity analysis method based on deep learning
Kumar et al. Aspect-based sentiment analysis using deep networks and stochastic optimization
CN107066446B (en) Logic rule embedded cyclic neural network text emotion analysis method
CN108090070B (en) Chinese entity attribute extraction method
CN112270196B (en) Entity relationship identification method and device and electronic equipment
CN110705206B (en) Text information processing method and related device
CN110347894A (en) Knowledge mapping processing method, device, computer equipment and storage medium based on crawler
CN108563638B (en) Microblog emotion analysis method based on topic identification and integrated learning
CN107688576B (en) Construction and tendency classification method of CNN-SVM model
CN107679110A (en) The method and device of knowledge mapping is improved with reference to text classification and picture attribute extraction
CN110750648A (en) Text emotion classification method based on deep learning and feature fusion
CN110704890A (en) Automatic text causal relationship extraction method fusing convolutional neural network and cyclic neural network
CN106874397B (en) Automatic semantic annotation method for Internet of things equipment
CN112069312A (en) Text classification method based on entity recognition and electronic device
CN106599824A (en) GIF cartoon emotion identification method based on emotion pairs
Eke et al. The significance of global vectors representation in sarcasm analysis
CN114048354B (en) Test question retrieval method, device and medium based on multi-element characterization and metric learning
Samih et al. Enhanced sentiment analysis based on improved word embeddings and XGboost.
CN114547303A (en) Text multi-feature classification method and device based on Bert-LSTM
CN113486143A (en) User portrait generation method based on multi-level text representation and model fusion
CN111859955A (en) Public opinion data analysis model based on deep learning
Baniata et al. Sentence representation network for Arabic sentiment analysis
CN111414755A (en) Network emotion analysis method based on fine-grained emotion dictionary
Rahman et al. A dynamic strategy for classifying sentiment from Bengali text by utilizing Word2vector model
CN114817533A (en) Bullet screen emotion analysis method based on time characteristics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant