CN112712127A - Image emotion polarity classification method combined with graph convolution neural network - Google Patents

Image emotion polarity classification method combined with graph convolution neural network Download PDF

Info

Publication number
CN112712127A
CN112712127A CN202110019810.1A CN202110019810A CN112712127A CN 112712127 A CN112712127 A CN 112712127A CN 202110019810 A CN202110019810 A CN 202110019810A CN 112712127 A CN112712127 A CN 112712127A
Authority
CN
China
Prior art keywords
emotion
model
words
image
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110019810.1A
Other languages
Chinese (zh)
Inventor
毋立芳
张恒
邓斯诺
石戈
简萌
相叶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN202110019810.1A priority Critical patent/CN112712127A/en
Publication of CN112712127A publication Critical patent/CN112712127A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds

Abstract

An image emotion polarity classification method combined with a graph convolution neural network relates to the technical field of intelligent media calculation and computer vision; firstly, extracting object information from a training sample, and establishing a graph model by using the object information and visual characteristics in each picture; secondly, extracting object interaction information contained in the graph model by using a graph convolution network, and fusing the object interaction information with the characteristics of the convolution neural network; then preprocessing the training sample and transmitting the preprocessed training sample into a network, and iteratively updating the parameters of the model by using a loss function and an optimizer until convergence is reached, thereby finishing training; and finally, sending the test data into a network to obtain the prediction result and classification accuracy of the model on the test data. According to the invention, the interactive features of the objects in the image in the emotion space are extracted, so that the classification features are more consistent with the emotion features of the objects and the human emotion triggering mechanism, and high-level semantic features are added on the basis of the visual features, which is beneficial to improving the performance of the emotion classification algorithm in the actual application scene.

Description

Image emotion polarity classification method combined with graph convolution neural network
Technical Field
The invention belongs to the technical field of computer vision, and relates to an image emotion polarity classification method combined with a graph convolution neural network.
Background
With the rapid development of image social networks, more and more people like to express their moods using pictures. Images containing emotional information have an important position in enhancing the view of content delivery and effectively affecting viewers. In the face of massive picture data containing user emotion, the analysis of emotion information contained in the pictures can greatly promote the development of social media, and the method is widely applied to the fields of education, advertisement, entertainment and the like. Therefore, image emotion analysis has become one of the recent research hotspots.
Earlier image emotion analysis methods mainly used statistical characteristics of images, such as colors, textures, lines and other artificial features, to realize emotion classification of images, but because a large semantic gap and an emotion gap exist between low-level features of images and human emotions, a good emotion polarity analysis effect is not obtained. In recent years, with the rapid development of social media and electronic computers, deep learning is continuously developed in the field of computer vision, and the deep learning has better effects in the fields of image classification, object detection, target tracking and the like. The role of advanced visual features extracted through convolutional neural networks in the field of computer vision has also attracted some researchers to apply them to image emotion classification. In 2015, You et al designed an image emotion classification algorithm implemented by using a convolutional neural network, and obtained a better classification effect compared with the conventional method, but the performance improvement of the method is limited due to the inherent limitation of the self-learning theory. With the intensive research on a convolutional neural network and a human emotion principle, Yang et al propose an image emotion classification algorithm combined with an image instance segmentation algorithm in 2018, and extract an area containing rich semantics and emotion content in an image by combining an algorithm with a good effect in the current image instance segmentation field, so that the enhancement of visual features in the image emotion classification algorithm is realized, and the accuracy of the image emotion classification algorithm is improved. Wu et al designed a convolutional neural network combining an image saliency detection algorithm and image emotion classification in 2019, and aiming at the connection between the human attention mechanism principle and human emotion, a region which attracts the most attention in an image obtained by a human is simulated by the saliency detection algorithm, and the polarity feedback adjustment is performed by using an error back propagation mechanism, so that the accuracy of image emotion classification is further improved.
The inspiration of people is aroused by some latest achievements and research achievements in the field of human emotion principle, and the effect of the classification algorithm is improved by acquiring and enhancing visual features contained in partial objects in the image by the existing image emotion classification algorithm. However, the existing research has certain limitations, and the existing research realizes the enrichment and enhancement of the visual features of the object on the basis of object segmentation, significance detection and the like, but the utilization of the object features is only limited to the visual features, and the interactive relationship of the object in the emotion space is not utilized, so that the designed model has limitations and limited performance improvement. On the basis, an emotion classification method combining the emotion relation and the visual features of the objects is designed by using a graph convolution model, the visual features of the images are used, the mutual influence of all objects in the images in an emotion space is also considered, the features of the images in an emotion semantic level are fully mined, and the emotion classification accuracy of the images is improved.
Disclosure of Invention
The invention aims to design an image emotion polarity classification method combined with a graph convolution neural network, and a frame diagram of the image emotion polarity classification method is shown in figure 1.
Aiming at the problems of the existing research method, a model utilizing the emotional relation characteristics of objects in the image is designed and combined with a graph convolution neural network, so that the model can simultaneously acquire the relation characteristics of the objects in the emotional space in the image and the visual characteristics of the image. By combining an open-source panorama segmentation algorithm Detectron2, a method for establishing a corresponding graph model according to a picture segmentation result is designed, and a graph convolution model is utilized to express the interaction characteristic, namely the relationship characteristic, of an object in an image in an emotion space while enriching the visual characteristic of the image, and is combined with a basic convolution neural network, so that the enrichment of high-level semantic characteristics of the image is realized, and the accuracy of emotion polarity analysis of the image is improved.
The method comprises the following specific steps:
step 1, obtaining image object information: carrying out panoramic segmentation processing on each picture in the data set by using a panoramic segmentation model to obtain information such as the category, position, area and the like of an object in the picture, and labeling the emotional polarity and intensity of object category words obtained by panoramic segmentation in the data set by using the labeling result of the words in SentiWordNet;
step 2, establishing a graph model: and establishing a corresponding graph model by taking the object as a node, the reciprocal of the emotional space distance between the object words and other objects as an edge weight and the brightness and texture characteristics of the corresponding region of the object as node characteristics.
Step 3, establishing a deep network model: using a basic convolution neural network model VGG-16 and a graph convolution model GCN, merging the outputs of the two models, inputting the merged outputs into a full connection layer, and replacing the final classification layer of the original model by using the number of classes to be classified as the output dimensionality of the full connection layer;
step 4, training a model: preprocessing an image in modes of scaling, random overturning and the like, inputting the image into a network model, optimizing the image by using a random gradient descent method, evaluating the performance of the model by using a cross entropy function and learning the parameters of the model;
step 5, obtaining the emotion types of the images to be detected: and (4) after the images in the database are subjected to the preprocessing step as the synchronization step 4, inputting the images into the model trained in the step 4 to obtain the corresponding emotion types.
Compared with the prior art, the invention has the following obvious and prominent substantive characteristics and remarkable technical progress:
the invention provides a novel image emotion classification algorithm, which combines an image convolution network with a basic deep convolution neural network and adds a high-level semantic feature of object emotion relation features on the basis of high-level visual features. Aiming at the problem that object information contained in a picture is not labeled in an existing public emotion data set, a design method is used for converting the picture into a graph model by utilizing research results in the field of panorama segmentation, the graph model is updated and enhanced by utilizing a graph convolution network and is fused with visual features, emotion features in the picture are accurately extracted by utilizing an emotion triggering mechanism, and a better emotion classification effect is obtained.
Drawings
The invention is described in further detail below with reference to the following figures and detailed description:
FIG. 1 is an architecture diagram of a convolutional neural network for training image emotion classification based on the method.
FIG. 2 is an overall flow chart of emotion image classification based on the method.
Detailed Description
The invention provides an image emotion polarity classification method combined with a graph convolution neural network. The overall structure of the present invention is shown in fig. 1. The invention is used for simulation in Windows10 and Pyhcharm environments. The specific implementation flow of the invention is shown in fig. 2, and the specific implementation steps are as follows:
step 1, obtaining image object information: carrying out panoramic segmentation processing on each picture in the data set by using a panoramic segmentation model to obtain information such as the category, position, area and the like of an object in the picture, and labeling the emotional polarity and intensity of object category words obtained by panoramic segmentation in the data set by using the labeling result of the words in SentiWordNet;
step 2, establishing a graph model: and establishing a corresponding graph model by taking the object as a node, the reciprocal of the emotional space distance between the object words and other objects as an edge weight and the brightness and texture characteristics of the corresponding region of the object as node characteristics.
Step 3, establishing a deep network model: using a basic convolution neural network model VGG-16 and a graph convolution model GCN, merging the outputs of the two models, inputting the merged outputs into a full connection layer, and replacing the final classification layer of the original model by using the number of classes to be classified as the output dimensionality of the full connection layer;
step 4, training a model: preprocessing an image in modes of scaling, random overturning and the like, inputting the image into a network model, optimizing the image by using a random gradient descent method, evaluating the performance of the model by using a cross entropy function and learning the parameters of the model;
step 5, obtaining the emotion types of the images to be detected: and (4) after the images in the database are subjected to the preprocessing step as the synchronization step 4, inputting the images into the model trained in the step 4 to obtain the corresponding emotion types.
In step 1, a method for acquiring image object information by using a panoramic segmentation model is designed:
the method can be used for emotion classification of images in a large real social network, so that a universal public emotion data set Flickr and Instagram (hereinafter referred to as FI data set) which is extracted and arranged from Flickr and Instagram is selected in the example, the data set has the characteristics of large data scale and accurate emotion marking, and is more in line with a real network environment.
Using the panorama segmentation model in detelton 2, each picture in the FI dataset is segmented and information such as the type, position, pixel point position, etc. of the object is saved. Aiming at the class words of an object, a word emotion labeling method utilizing an emotion dictionary SentiWordNet is designed, and a specific calculation method is as follows:
Figure BDA0002888097200000051
Sw=Sp-Sn
wherein SpPositive emotional intensity for object word W, where p represents positive, SnThe negative emotion intensity subscript n of the object represents negative emotion, S ' is noun and adjective of the current word W contained in SentiWordNet, m is the number of S ', and S 'ip、S′inFor marking the SentiWordNetNote the positive and negative emotion intensity of S', where the subscript i represents the order in n words, p, n represent positive and negative, respectively, with values between 0 and 1, and the emotion value of the current word W is calculated using the sum-average method. And simultaneously, representing the emotional intensity S of the current word by the difference value of the positive and negative emotional intensities of each word.
In step 2, a method for establishing a graph model by using image object information is designed:
and using object information obtained by panoramic segmentation, using objects contained in the picture as nodes of the graph model, and using the reciprocal of the distance of the words in the emotion space as the edge weight between the nodes in the graph model. The distance of the words in the emotion space is calculated by adopting the following formula:
Figure BDA0002888097200000052
wherein Si、SjRespectively represent two words Wi、WjWhen the emotion polarities of the two words are opposite, the distance of the words in the emotion space is described by adding 1 to the absolute value of the difference of the emotion intensities of the two words, and when the emotion polarities of the two words are the same, the absolute value of the difference of the emotion intensities is used as the emotion distance. In particular, when the emotional intensity of two words is 0 at the same time, we specify that the emotional distance of the two words is 0.5. since the emotional value of a word is between 0-1, 0.5 is used to distinguish the case of two neutral words and two homopolar words.
And taking the brightness characteristic and the texture characteristic as node characteristics in the graph model. And (4) obtaining the image area where each object is located by using the position information of the object frame obtained in the step (1). The brightness histogram of the pixels in the picture is taken as the brightness characteristic, namely the RGB value of the pixels in the image area where the object is located is converted into the hue, saturation and brightness of an HSI space, meanwhile, the brightness value is quantized, the distribution curve of the quantized brightness value is taken as the brightness characteristic, the brightness is quantized to be 0-255, and finally, the 256-dimensional characteristic vector is obtained.
Meanwhile, for the texture characteristics, the area where each object is located is calculated by using a gray level co-occurrence matrix method. Here we propose to perform the calculation in the 45 ° or 135 ° direction, the feature quality obtained by using the 0 ° or 90 ° calculation is low, and the result after the calculation is quantized to a feature vector of 256 dimensions.
And finally, splicing the brightness features and the texture features to obtain 512-dimensional feature vectors as features corresponding to each node in the graph model, and establishing the graph model by using the edge weight matrix A as an edge of the graph model.
In step 3, a deep network model is established:
and extracting the relation characteristics by using a GCN model. The method is realized by using a structure of stacked GCN, wherein a two-layer GCN structure is used in the method, and the input characteristic H of the current layer kkIs output from the previous layer, and the output result H is calculated in the following wayk+1
Figure BDA0002888097200000061
Wherein
Figure BDA0002888097200000062
The result of adding the edge weight A obtained by the calculation of the adjacent matrix according to the emotion value of the object and the unit matrix is obtained; wkThe weight matrix of the current convolutional layer k is obtained by random initialization, and training and adjustment are carried out in the training process according to a loss function until the training is finished; a is a non-linear activation function,
Figure BDA0002888097200000063
calculated from the following formula:
Figure RE-GDA0002987101290000064
wherein
Figure RE-GDA0002987101290000065
Is composed of
Figure RE-GDA0002987101290000066
Wherein i represents
Figure RE-GDA0002987101290000067
The coordinates of (a) are (b),
Figure RE-GDA0002987101290000068
is composed of
Figure RE-GDA0002987101290000069
Is an element of (i), ij is
Figure RE-GDA00029871012900000610
Coordinates of (2).
And obtaining a characteristic vector with 2048-dimensional relational characteristics through calculation of a GCN model.
For the visual features, a VGG16 is adopted to obtain a VGG16 model which is pre-trained on ImageNet, the classification layer with the final input dimension of 2048 × 1024 and the output dimension of 1 × 1024 is removed after loading, and the output dimension of the classification layer is adjusted to 2048 × 1 to serve as the visual features. The GCN network comprises two map convolutional layers in total, wherein the input dimension of the first map convolutional layer is 512 x 1, the output is 1024 x 1, the input dimension of the second convolutional layer is 1024 x 1, and the output is 2048 x 1, namely the final relational feature vector. Performing splicing operation on the visual features and the relational features to obtain 4096 x 1 feature vectors, inputting the feature vectors into a final classification layer, outputting the dimension of 1 x 2, namely predicting probability of each category, and taking the corresponding emotion category of the position to which the maximum item belongs as the emotion category of image output;
in the step 4, the training of the model is realized through operations such as data preprocessing, data input, calculation of a loss function and the like:
the images are pre-processed by scaling, random flipping and the like, in this example, the parameter of random cropping is set to 224 × 224, and the probability of random flipping is set to 0.5. And inputting the batch with the fixed size into the network model, and taking the sample of the batch with the fixed size as a batch. The fixed batch size setting will improve the training effect of the model to a certain extent as much as possible, but due to the limitation of the experimental platform, it is recommended to select 8, 16 or 32, in this example, the fixed batch size setting is 16. And automatically comparing the output prediction result with the input training set label through the final classification layer, and counting the proportion of the correct number of samples in the whole training sample as the accuracy of the training set in the round. And meanwhile, when an output vector is obtained, a loss value of the current model can be calculated by using a loss function shown below, and the loss value is fed back to the optimizer for processing and then carrying out back propagation to update each parameter in the model.
In the calculation of the loss function, the cross entropy loss function shown as follows is used, and the purpose is to keep the distance between classes and make the images between different emotion classes farther:
Figure BDA0002888097200000071
wherein m is the number of images in each batch during training, N is the number of emotion classes in the data set, and xiFor the features of the ith picture in the batch obtained from the basic backbone network in the foregoing 3 before the classification layer, w and b are the values of weight and bias parameters in the classification layer, and the subscript yiIndicating the class of prediction after the classification level, j representing the class number corresponding to the prediction result, e.g. wherein
Figure BDA0002888097200000072
Indicates that the ith picture of the batch is judged as yiClass is the value of the bias parameter in the classification layer.
In consideration of the convergence speed and the convergence effect, the optimizer in the method selects a random gradient descent method as an optimization method. The parameter setting of the optimizer mainly comprises two items of initial learning rate (learning rate) and momentum (momentum), wherein the initial learning rate is generally selected according to the convergence condition of the model in the equivalence of 0.1, 0.01, 0.0001 and 0.00001, the embodiment recommends 0.01, and the convergence effect is more stable at the initial value. The momentum is in principle between 0 and 1, in this case preferably a default value of 0.9 in the stochastic gradient descent method. Because the setting of the fixed learning rate is not beneficial to the deep network to search better parameters in the second half of training, the method increases the strategy of reducing the learning rate in fixed rounds in the training process. Wherein the reduced rounds recommend 1-2 reductions in 20-30 rounds and the total number of training rounds recommends 50-80 rounds. In this example, the optimizer is set to reduce the learning rate every 20 rounds and every 30 rounds, and model parameters are trained and learned for 80 rounds to ensure effective convergence of the training effect, and too few rounds of setting may not be converged, and too many rounds of setting may increase the training time but may not improve the effect.
After each round of training samples is finished, parameters of the model are fixed, the parameters are cut into the network model in a scaling mode by adopting the fixed size of the data of the verification set in the FI data set, in the example, the cutting parameters are set to be 224 x 224, the output of the model is compared with the labels of the samples, the proportion occupied by the correct samples, namely the accuracy rate of the verification set, is counted, if the accuracy rate of the verification set of the current round number is higher than the accuracy rate of the previous highest verification set, the accuracy rate of the verification set with the highest current accuracy rate is saved, and the model trained by the current round number is saved. After all rounds of training are finished, the model under the highest verification set accuracy rate is finally stored, namely the trained optimal model;
in the step 5, obtaining the emotion type of the image to be detected:
and (3) cutting the test set data or any image in the FI data set according to a fixed-size scaling center like the image in the verification set in the synchronization step 4, and then inputting the test set data or any image into the model one by one or in batches by a fixed quantity. In the example, the parameter of the fixed-size zoom center cropping is set to 224 × 224, and in order to improve the processing efficiency under the same experimental conditions, the test set data in the example recommends that 16 is the batch size, and test images are output to the model according to batches for testing. And after model processing, comparing the output result after the classification layer with the label of the sample, and counting the proportion of the correct sample, namely the accuracy of the test set. And the emotion type corresponding to the output result is the image emotion type judged by the model change.
The test set in the FI data set is subjected to the model test in the example, the accuracy result is 0.8808, which is higher than the best effect in the research content of the current similar method: the accuracy of the Visual Sentiment preference ON Automatic Discovery of influence Regions published in the 2018 high-level journal IEEE TRANSACTIONS MULTIMEDIA was 0.8635.

Claims (5)

1. The image emotion polarity classification method combined with the graph convolution neural network is characterized by comprising the following steps of:
step 1, obtaining image object information: carrying out panoramic segmentation processing on each picture in the data set by using a panoramic segmentation model to obtain the category and position information of an object in the picture, and carrying out emotional polarity and intensity labeling on the object category words obtained by panoramic segmentation in the data set according to the labeling result of the words in SentiWordNet;
step 2, establishing a graph model: establishing a corresponding graph model by taking an object as a node, taking the reciprocal of the distance of the object words in the emotion space as an edge weight and taking the brightness and texture characteristics of a region corresponding to the object as node characteristics;
step 3, establishing a deep network model: using a basic convolution neural network model VGG-16 and a graph convolution model GCN, merging the outputs of the two models, inputting the merged outputs into a full connection layer, and replacing the final classification layer of the original model by using the number of classes to be classified as the output dimensionality of the full connection layer;
step 4, training a model: preprocessing an image, inputting the image into a network model, optimizing by using a random gradient descent method, evaluating the performance of the model by using a cross entropy function and learning the parameters of the model;
step 5, obtaining the emotion types of the images to be detected: and (4) preprocessing the images in the database, and inputting the preprocessed images into the model trained in the step (4) to obtain the corresponding emotion types.
2. The method of claim 1, wherein: in the step 1, the object information of the picture is identified by using a panorama segmentation algorithm, and a word emotion labeling method of an emotion dictionary SentiWordNet is used, wherein the specific calculation method is as follows:
Figure FDA0002888097190000011
Sw=Sp-Sn
wherein SpPositive emotional intensity for object word W, where p represents positive, SnThe negative emotion intensity subscript n of the object represents negative emotion, S ' is noun and adjective of the current word W contained in SentiWordNet, m is the number of S ', and S 'ip、S′inIs S 'noted in SentiWordNet'iThe value of the positive and negative emotion intensity is between 0 and 1, and the emotion value of the current word W is calculated by using a summing average method; representing the emotional intensity S of the current word by the difference value of the positive and negative emotional intensities of each wordW
3. The method of claim 1, wherein: in the step 2, the object information obtained by using the panorama segmentation algorithm takes the object characteristics contained in the picture as nodes of the graph model, and takes the reciprocal distance of the words in the emotion space as the edge weight between the nodes in the graph model; for a plurality of object words W contained in the picture1、W2…, calculating two words W by the following formulai、WjDistance in emotion space:
Figure FDA0002888097190000021
Si、Sjrepresenting the word W calculated according to the method in the step 1i、WjCorresponding to the emotion values, when the emotion polarities of the two words are opposite, the distance of the words in the emotion space is described by adding 1 to the absolute value of the difference of the emotion intensities of the two words, and when the emotion polarities of the two words are the same, the absolute value of the difference of the emotion intensities is used as the emotion distance; when the emotional intensity of two words is 0 at the same time, the emotion of two words is specifiedThe sensing distance is 0.5 to distinguish the situations of two neutral words and two homopolar words;
finally, with two words Wi、WjThe reciprocal of the emotional distance is used as the edge weight A of the corresponding node in the graph modelijRepeating the steps to calculate all words W respectively1、W2…, obtaining an edge weight matrix A of each picture;
taking the brightness characteristic and the texture characteristic as node characteristics in the graph model; obtaining an image area where each object is located by using object position information obtained in a panoramic segmentation algorithm; taking a brightness histogram of pixels in a picture as a brightness characteristic, namely converting an RGB value of the pixels in an image area where an object is located into hue, saturation and brightness of an HSI space, quantizing the brightness value, taking a quantized brightness value distribution curve as the brightness characteristic, quantizing the brightness into 0-255, and finally obtaining a 256-dimensional characteristic vector;
meanwhile, for texture features, calculating the area where each object is located by using a gray level co-occurrence matrix method; calculating in the direction of 45 degrees or 135 degrees, and quantizing the calculated result into a 256-dimensional characteristic vector;
and finally, splicing the brightness features and the texture features to obtain 512-dimensional feature vectors as features corresponding to each node in the graph model, and establishing the graph model by using the edge weight matrix A as an edge of the graph model.
4. The method of claim 1, wherein: in the step 3, the process is carried out,
extracting relation characteristics in the graph model by using a GCN model; structure implementation using stacked GCN, using a two-layer GCN structure, where the input feature H of the current layer kkIs output from the previous layer, and the output result H is calculated in the following wayk +1
Figure FDA0002888097190000031
Wherein
Figure FDA0002888097190000032
The result of adding the edge weight A obtained by the calculation of the adjacent matrix according to the emotion value of the object and the unit matrix is obtained; wkThe weight matrix of the current convolutional layer k is obtained by random initialization, and training and adjustment are carried out in the training process according to the loss function until the training is finished; a is a non-linear activation function,
Figure FDA0002888097190000033
calculated from the following formula:
Figure FDA0002888097190000034
wherein
Figure FDA0002888097190000035
Is composed of
Figure FDA0002888097190000036
Wherein i represents
Figure FDA0002888097190000037
The coordinates of (a) are (b),
Figure FDA0002888097190000038
is composed of
Figure FDA0002888097190000039
Is an element of (i), ij is
Figure FDA00028880971900000310
Coordinates of (5);
calculating by a GCN model to obtain characteristic vectors with 2048 x 1 dimensions of relational characteristics;
for visual features, a VGG16 is adopted to obtain a VGG16 model which is pre-trained on ImageNet, the last classification layer with input dimension of 2048 x 1024 and output dimension of 1024 x 1 is removed after loading, and the output dimension of the classification layer is adjusted to 2048 x 1 and serves as the visual features;
and (5) splicing the relational features and the visual features to obtain 4096 x 1 feature vectors and inputting the feature vectors into the full-connection layer to realize the classification of the image emotions.
5. The method of claim 1, wherein: in said step 4, the loss function uses cross-entropy loss to make a basic loss measure; the specific loss function is as follows:
Figure FDA00028880971900000311
wherein m is the number of images in each batch during training, N is the number of emotion classes in the data set, and xiObtaining characteristics of the ith picture in the batch from the basic backbone network in the step 3 before the classification layer; w and b are weight and bias parameter values in the classification layer, are obtained by random initialization, are trained and adjusted according to a loss function in the training process until the training is finished, and have subscript yiRepresenting the predicted category after the classification layer, j represents the category number corresponding to the prediction result, wherein
Figure FDA00028880971900000312
Indicates that the ith picture of the batch is judged as yiA value of a bias parameter in a classification layer at the time of classification;
in the training process, 0.01 is used as an initial learning rate, the learning rate is reduced to one tenth of the current learning rate every 20 rounds, and the training is finished after the training times reach more than 80.
CN202110019810.1A 2021-01-07 2021-01-07 Image emotion polarity classification method combined with graph convolution neural network Pending CN112712127A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110019810.1A CN112712127A (en) 2021-01-07 2021-01-07 Image emotion polarity classification method combined with graph convolution neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110019810.1A CN112712127A (en) 2021-01-07 2021-01-07 Image emotion polarity classification method combined with graph convolution neural network

Publications (1)

Publication Number Publication Date
CN112712127A true CN112712127A (en) 2021-04-27

Family

ID=75548482

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110019810.1A Pending CN112712127A (en) 2021-01-07 2021-01-07 Image emotion polarity classification method combined with graph convolution neural network

Country Status (1)

Country Link
CN (1) CN112712127A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113297936A (en) * 2021-05-17 2021-08-24 北京工业大学 Volleyball group behavior identification method based on local graph convolution network
CN113392781A (en) * 2021-06-18 2021-09-14 山东浪潮科学研究院有限公司 Video emotion semantic analysis method based on graph neural network
CN113449640A (en) * 2021-06-29 2021-09-28 中国地质大学(武汉) Remote sensing image building semantic segmentation edge optimization method based on multitask CNN + GCN
CN116385029A (en) * 2023-04-20 2023-07-04 深圳市天下房仓科技有限公司 Hotel bill detection method, system, electronic equipment and storage medium
CN116721284A (en) * 2023-05-25 2023-09-08 上海蜜度信息技术有限公司 Image classification method, device, equipment and medium based on image enhancement

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107679580A (en) * 2017-10-21 2018-02-09 桂林电子科技大学 A kind of isomery shift image feeling polarities analysis method based on the potential association of multi-modal depth
CN108416397A (en) * 2018-03-30 2018-08-17 华南理工大学 A kind of Image emotional semantic classification method based on ResNet-GCN networks
CN111563164A (en) * 2020-05-07 2020-08-21 成都信息工程大学 Specific target emotion classification method based on graph neural network
CN112001186A (en) * 2020-08-26 2020-11-27 重庆理工大学 Emotion classification method using graph convolution neural network and Chinese syntax

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107679580A (en) * 2017-10-21 2018-02-09 桂林电子科技大学 A kind of isomery shift image feeling polarities analysis method based on the potential association of multi-modal depth
CN108416397A (en) * 2018-03-30 2018-08-17 华南理工大学 A kind of Image emotional semantic classification method based on ResNet-GCN networks
CN111563164A (en) * 2020-05-07 2020-08-21 成都信息工程大学 Specific target emotion classification method based on graph neural network
CN112001186A (en) * 2020-08-26 2020-11-27 重庆理工大学 Emotion classification method using graph convolution neural network and Chinese syntax

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
CHEN ZHAOMIN 等: "Multi-Label Image Recognition With Graph Convolutional Networks", 《IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》, 9 January 2020 (2020-01-09), pages 5172 - 5181 *
YANG JUFENG 等: "Visual Sentiment Prediction Based on Automatic Discovery of Affective Regions", 《IEEE TRANSACTIONS ON MULTIMEDIA》, vol. 20, 7 February 2018 (2018-02-07), pages 2513 *
ZHAO PINLONG 等: "Modeling sentiment dependencies with graph convolutional networks for aspect-level sentiment classification", 《KNOWLEDGE-BASED SYSTEMS》, vol. 193, 28 December 2019 (2019-12-28), pages 1 - 10, XP086080506, DOI: 10.1016/j.knosys.2019.105443 *
梁宁: "基于注意力机制及深度学习的文本情感分析研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》, no. 1, 15 January 2020 (2020-01-15) *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113297936A (en) * 2021-05-17 2021-08-24 北京工业大学 Volleyball group behavior identification method based on local graph convolution network
CN113392781A (en) * 2021-06-18 2021-09-14 山东浪潮科学研究院有限公司 Video emotion semantic analysis method based on graph neural network
WO2022262098A1 (en) * 2021-06-18 2022-12-22 山东浪潮科学研究院有限公司 Video emotion semantic analysis method based on graph neural network
CN113449640A (en) * 2021-06-29 2021-09-28 中国地质大学(武汉) Remote sensing image building semantic segmentation edge optimization method based on multitask CNN + GCN
CN113449640B (en) * 2021-06-29 2022-02-11 中国地质大学(武汉) Remote sensing image building semantic segmentation edge optimization method based on multitask CNN + GCN
CN116385029A (en) * 2023-04-20 2023-07-04 深圳市天下房仓科技有限公司 Hotel bill detection method, system, electronic equipment and storage medium
CN116385029B (en) * 2023-04-20 2024-01-30 深圳市天下房仓科技有限公司 Hotel bill detection method, system, electronic equipment and storage medium
CN116721284A (en) * 2023-05-25 2023-09-08 上海蜜度信息技术有限公司 Image classification method, device, equipment and medium based on image enhancement

Similar Documents

Publication Publication Date Title
CN110162593B (en) Search result processing and similarity model training method and device
CN112712127A (en) Image emotion polarity classification method combined with graph convolution neural network
CN111563164B (en) Specific target emotion classification method based on graph neural network
CN110580500A (en) Character interaction-oriented network weight generation few-sample image classification method
CN111582397B (en) CNN-RNN image emotion analysis method based on attention mechanism
CN112613552A (en) Convolutional neural network emotion image classification method combining emotion category attention loss
CN111582409A (en) Training method of image label classification network, image label classification method and device
CN108427740B (en) Image emotion classification and retrieval algorithm based on depth metric learning
CN109710804B (en) Teaching video image knowledge point dimension reduction analysis method
CN112651940B (en) Collaborative visual saliency detection method based on dual-encoder generation type countermeasure network
CN112989116B (en) Video recommendation method, system and device
CN111950528A (en) Chart recognition model training method and device
CN114332578A (en) Image anomaly detection model training method, image anomaly detection method and device
CN112148831B (en) Image-text mixed retrieval method and device, storage medium and computer equipment
CN114067385A (en) Cross-modal face retrieval Hash method based on metric learning
CN113761253A (en) Video tag determination method, device, equipment and storage medium
CN116226785A (en) Target object recognition method, multi-mode recognition model training method and device
CN114298122A (en) Data classification method, device, equipment, storage medium and computer program product
Ji et al. ColorFormer: Image colorization via color memory assisted hybrid-attention transformer
Zhang et al. Image composition assessment with saliency-augmented multi-pattern pooling
CN111930981A (en) Data processing method for sketch retrieval
CN116258990A (en) Cross-modal affinity-based small sample reference video target segmentation method
CN115035341A (en) Image recognition knowledge distillation method capable of automatically selecting student model structure
CN114898141A (en) Multi-view semi-supervised image classification method based on contrast loss
CN114330514A (en) Data reconstruction method and system based on depth features and gradient information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination