CN108647191B - Sentiment dictionary construction method based on supervised sentiment text and word vector - Google Patents

Sentiment dictionary construction method based on supervised sentiment text and word vector Download PDF

Info

Publication number
CN108647191B
CN108647191B CN201810473308.6A CN201810473308A CN108647191B CN 108647191 B CN108647191 B CN 108647191B CN 201810473308 A CN201810473308 A CN 201810473308A CN 108647191 B CN108647191 B CN 108647191B
Authority
CN
China
Prior art keywords
word
emotion
text
words
loss
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810473308.6A
Other languages
Chinese (zh)
Other versions
CN108647191A (en
Inventor
张雷
张文哲
李昀
姚懿荣
谢俊元
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201810473308.6A priority Critical patent/CN108647191B/en
Publication of CN108647191A publication Critical patent/CN108647191A/en
Application granted granted Critical
Publication of CN108647191B publication Critical patent/CN108647191B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/117Tagging; Marking up; Designating a block; Setting of attributes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides an emotion dictionary construction method based on supervised emotion text and word vectors. The method comprises the steps of generating word vectors by using a neural network, embedding emotions into the word vectors, mining the internal relation between words, then constructing a word relation graph, spreading emotion labels by using a label spreading algorithm, and automatically constructing an emotion dictionary in a specific field. The invention solves the problem that the emotion dictionary constructed by the method based on manpower and the method based on the knowledge base is inaccurate when processing the emotion analysis task in the specific field.

Description

Sentiment dictionary construction method based on supervised sentiment text and word vector
Technical Field
The invention relates to the field of emotion analysis, in particular to an emotion dictionary construction method based on supervised emotion text and word vectors.
Background
With the rapid development of the internet, various network platforms such as microblogs, posts, forums and the like are popular, so that numerous public sounding opportunities are provided for people. The published textual data thus produced is numerous, readily available, and contains tremendous commercial and social value. In order to acquire the emotional tendency of people to things or events in the texts, the emotion analysis technology is distinguished.
Conventionally, emotion dictionaries are important tools for emotion analysis. An excellent emotion dictionary can greatly improve the emotion analysis effect. Generally, as the application field changes, the emotion embodied by the word changes accordingly. Therefore, when processing the emotion analysis task in a specific field, it is time-consuming and labor-consuming to manually arrange the emotion dictionary, and an automated method is required to construct the emotion dictionary. The existing automatic construction methods of emotion dictionaries are divided into two categories, namely a knowledge base-based method and a corpus-based method. The knowledge base based approach relies on an existing semantic knowledge base. These manually organized repositories record paraphrases of a large number of words and word-to-word relationships (e.g., synonyms, antonyms). The method based on the knowledge base constructs the emotion dictionary with high accuracy and universality through the existing knowledge. However, for Chinese, a complete knowledge base is relatively scarce, so that the method cannot be well applied to the construction of a Chinese emotion dictionary. Meanwhile, the emotion dictionary generated by the method is relatively universal, and the problem of emotion change of words in different fields cannot be well solved. A corpus-based approach can be used to generate a domain-specific emotion dictionary. The method processes the corpus text and excavates the relationships between words in the corpus, such as word connection relationships, co-occurrence relationships and the like. Which generates an emotion dictionary by grouping closely related words together by setting rules or using a statistical method. The method only considers simple relations of words in the text, ignores complexity of the text, and influences of complex syntaxes, negative words and the like influence effects of the method.
Disclosure of Invention
The purpose of the invention is as follows: the invention provides an emotion dictionary construction method based on supervised emotion text and word vectors, aiming at the defects of an emotion dictionary automatic construction method based on a corpus.
The technical scheme is as follows: the technical scheme provided by the invention is as follows:
a method for constructing an emotion dictionary based on supervised emotion texts and word vectors comprises the following steps:
(1) acquiring a text data set D, wherein the text data set D comprises a positive emotion text with a positive emotion mark and a negative emotion text with a negative emotion mark;
(2) preprocessing texts in the text data set; constructing a vocabulary V, and filling words in the preprocessed text data set into the vocabulary V one by one;
(3) calculating the emotional tendency value of each word in the vocabulary V by adopting an SO-PMI method, and determining the emotional mark of the corresponding word according to the emotional tendency value:
Figure BDA0001663799400000021
wherein, tablewExpressing the emotion mark of the word w, and SO-PMI (w) expressing the emotional tendency value of the word w;
(4) constructing an improved skip-gram model with word level supervision, wherein the improved skip-gram model takes the words in the D as input data and predicts the context and emotion marks of the words; loss function loss in computing a prediction contextcontextAnd loss function loss in predicting emotion markword
losscontextAnd losswordAre respectively:
losscontext(wt)=-∑-k≤j≤k,j≠0logp(wt+k|wt)
Figure BDA0001663799400000022
wherein, wtMeaning term, wt∈D;{wt-k,…,wt-1,wt+1,…,wt+kIndicates the predicted set of context words, including the predicted word wtK words before and k words after, p (w)t+j|wt) Denotes wt+jIs predicted as wtProbability of context of (p (pos | w)t) Denotes wtProbability of being predicted to have positive emotion marker, p (neg | w)t) Denotes wtA probability of being predicted to have a negative sentiment marker;
(5) constructing a convolutional neural network model as a text-level supervision model, wherein the text-level supervision model takes a text in a text data set D as input data and predicts emotion marks of the text; calculating a loss function loss between the predicted emotion mark of the text and the actual emotion mark of the textdoc
Figure BDA0001663799400000023
Wherein d isiRepresenting text, di∈D;
Figure BDA0001663799400000024
Denotes diThe sentiment tag of (1); p (pos | d)i) Denotes diProbability of being predicted to have positive emotion marker, p (neg | d)i) Denotes didiA probability of being predicted to have a negative sentiment marker;
(6) setting a joint loss function:
loss=α1·losscontext2·lossdoc3·lossword
in the formula, alpha1、α2、α3Are respectively losscontext、lossdoc、losswordThe weight coefficient of (a);
(7) text data set D and emotion mark table of wordswEmotion marking of text
Figure BDA0001663799400000033
Training a joint loss function by using a back propagation algorithm for inputting data to obtain a word vector with emotion embedding;
(8) constructing a word relation graph G according to the word vector with emotion embedding obtained in the step (7);
(9) selecting partial words in the word relation graph G as seed words, and marking emotion labels for the seed words, wherein the emotion labels comprise commendation, derogation and neutrality; and then, propagating the emotion labels of the seed words in the relational graph G by using a label propagation algorithm to generate an emotion dictionary.
Further, the calculation formula of the emotional tendency value is as follows:
Figure BDA0001663799400000031
where SO-PMI (w) represents an emotional tendency value of word w, pos represents positive emotion text, neg represents negative emotion text, p (w | pos) represents a probability that word w appears in the positive emotion text, and p (w | neg) represents a probability that word w appears in the negative emotion text.
Further, the improved skip-gram model with the word level supervision comprises an input layer, a projection layer and an output layer, wherein the input layer is a word w in the text data set DtProjection layer will be the word wtProjected as a word vector C (w)t) The output layer is based on C (w)t) Separately predict wtContext and emotion markup ofw
Further, the text-level supervision model includes: an input layer, a convolution layer, a pooling layer and a full-link layer, wherein the input layer is a text D in a text data set Di(ii) a From the text d, the convolutional layer is passed through a feature extractoriExtracting a plurality of feature vectors and sending the feature vectors to a pooling layer; the pooling layer selects the most important characteristic vector from the characteristic vectors through Max PaolingOvertime operation and outputs the most important characteristic vector to the full connection layer; the full-connection layer predicts an input text d through a softmax function according to the received feature vectorsiIs marked with emotion
Figure BDA0001663799400000032
Further, the specific steps of constructing the word relationship graph G include:
1) extracting verbs, adjectives and adverbs in the vocabulary V to form a new vocabulary V';
2) constructing a word relation graph G, and taking words in V' as vertexes in G;
3) for each word w in ViCalculating wiAnd (4) selecting k words with the nearest Euclidean distance from all other words in the V' in the word vector space obtained in the step (7), and establishing w in the word relation graph GiAnd the weight calculation formula of the edge between the k words is as follows:
Figure BDA0001663799400000041
wherein, wijThe expression wiAnd wjWeight of edges in between, xi、xjAre respectively a word wiAnd wjWord vector of (1), euclidean _ dis (x)i,xj) Denotes xi、xjThe Euclidean distance between; σ is a constant parameter for controlling wijThe value of (a).
For the sum word wiOther words than the m words closest in distance, let wij=0
Has the advantages that: compared with the prior art, the invention has the following advantages:
the emotion dictionary establishing method based on the supervised corpus is used for generating the emotion dictionary, generating word vectors by using a neural network, excavating the internal connection between words, spreading emotion labels by using a label spreading algorithm and automatically establishing the emotion dictionary in a specific field. The method avoids the defect that the emotion dictionary construction method based on the knowledge base cannot be used for emotion analysis in a specific field, and strengthens the consideration of the complex relation of the words in the text compared with other methods based on the corpus. And finally, automatically constructing the emotion dictionary.
Drawings
FIG. 1 is an overall flow chart of the present invention;
FIG. 2 is a block diagram of an improved skip-gram model;
fig. 3 is a block diagram of a convolutional neural network model.
Detailed Description
The present invention will be further described with reference to the accompanying drawings.
Fig. 1 shows the overall process of the present invention, which is mainly divided into three stages: the data processing stage, the word vector emotion embedding stage and the emotion dictionary generating stage are described in detail below with reference to fig. 1 to 3.
First, data processing stage (step 1-3):
step 1 is data acquisition, namely acquiring label with emotion labelThe text data sets D, the emotion labels of the text are divided into positive and negative, and the mark is used
Figure BDA0001663799400000051
Representing text diOf (a) wherein d i0 denotes a negative emotion label, d i1 denotes a positive emotion label.
Step 2 is data preprocessing, firstly, an open source tool jieba is used for word segmentation and part of speech tagging of the text, then a stop word list is used for removing stop words in the text to obtain a word sequence, a vocabulary list V is constructed according to the word sequence, and a preprocessed text data set is represented as D ═ D { (D)1,d2,…,dn}。
And step 3, calculating the rough emotion of the word by using an SO-PMI method, wherein the SO-PIM method has the following calculation formula:
Figure BDA0001663799400000052
where SO-PMI (w) represents an emotional tendency value of word w, pos represents positive emotion text, neg represents negative emotion text, p (w | pos) represents a probability that word w appears in the positive emotion text, and p (w | neg) represents a probability that word w appears in the negative emotion text.
Define a blewThe emotional indicia of the word w is represented,
Figure BDA0001663799400000053
II, word vector emotion embedding stage (step 4-6):
and step 4, constructing a word-level supervision model, namely constructing an improved Skip-gram model, training word vectors by using word-level emotion supervision data, wherein the model consists of an input layer, a projection layer and an output layer. The input layer is a word w in the training datatProjection layer will be word wtProjected as a word vector representation C (w)t) The output layer uses C (w)t) Separately predict wtContext and emotion markup of
Figure BDA0001663799400000054
Wherein,
prediction of wtThe loss function of context (1) is:
Figure BDA0001663799400000055
the loss function in predicting emotion markers is:
Figure BDA0001663799400000056
wherein, wtMeaning term, wtE is as for D; k denotes the scope of the prediction context, { wt-k,…,wt-1,wt+1,…,wt+kIndicates the predicted set of context words, including the predicted word wtThe first k words and the last k words; p (w)t+j|wt) Denotes wt+jIs predicted as wtProbability of context of (p (pos | w)t) Denotes wtProbability of being predicted to have positive emotion, p (neg | w)t) Denotes wtThe probability of being predicted to have a negative emotion.
And step 5, constructing a text level supervision model, namely constructing a convolutional neural network model, wherein the convolutional neural network model consists of an input layer, a convolutional layer, a pooling layer and a full-connection layer. Wherein the input is a text segment D in the text data set DiThe convolutional layer is extracted from the text d by a feature extractoriExtracting a plurality of feature vectors, extracting the most important feature vector from the feature vectors as output by a Pooling layer through Max Pooling Over Time operation, and predicting an input text d by a full-connection layer through a softmax functioniIs marked with emotion
Figure BDA0001663799400000061
The loss function is:
Figure BDA0001663799400000062
wherein p (pos | d)i) Denotes diProbability of being predicted to have positive emotion, p (neg | d)i) Denotes diThe probability of being predicted to have a negative emotion.
Step 6 is a joint training model, namely, a word level supervision model and a text level supervision model are combined, and a loss function is set as follows:
loss=α1·losscontext2·lossdoc3·lossword
in the formula, alpha1、α2、α3Are respectively losscontext、lossdoc、losswordFor controlling the weights of the three loss functions in the final loss function, respectively. Text data set D and emotion mark table of wordswEmotion marking of text
Figure BDA0001663799400000063
For data input, loss is optimally trained by using a random gradient descent and error back propagation algorithm, and word vectors with emotion embedding are obtained.
Thirdly, generating an emotion dictionary (step 7-9):
step 7 is constructing a word relation graph, which comprises the following specific steps:
1) extracting verbs, adjectives and adverbs in the vocabulary V to form a new vocabulary V';
2) constructing a word relation graph G, and taking words in V' as vertexes in G;
3) for each word w in ViCalculating wiAnd (4) selecting k words with the nearest Euclidean distance from all other words in the V' in the word vector space obtained in the step (7), and establishing w in the word relation graph GiAnd the weight calculation formula of the edge between the k words is as follows:
Figure BDA0001663799400000071
wherein, wijThe expression wiAnd wjWeight of edges in between, xi、xjAre respectively a word wiAnd wjWord vector of (1), euclidean _ dis (x)i,xj) Denotes xi、xjThe Euclidean distance between; σ is a constant parameter for controlling wijThe value of (a).
For the sum word wiOther words than the m words closest in distance, let wij=0
Step 8 is to use a label propagation algorithm to propagate the emotion labels, and the specific steps are as follows:
1) manually marking a small number of seed emotion words, namely manually marking a small number of words with commendability, derogatory meaning and neutrality as seed words;
2) defining a label matrix Y, wherein the label matrix Y is a matrix with the size of | V '| × 3, each row in the Y corresponds to a word in the vocabulary V', and three columns respectively represent the probability that the word is in justice, derogation and neutral. Initializing a label matrix according to the manually marked seed emotional words; the initialization method adopted in the embodiment is as follows: for the manually labeled words in 1), if positive, the corresponding line is initialized to [1,0,0], if derogative, to [0,1,0], and if neutral, to [0,0,1 ]; for words not manually labeled in 1), the corresponding row is initialized to [0,0,0 ];
3) defining a probability transition matrix T such that
Figure BDA0001663799400000072
4) Carrying out emotion label propagation: y ═ TY
5) Reinitializing the label probability distribution of the artificial labeled data in the label matrix Y according to the initialization mode in 2)
6) If the label matrix Y is converged, stopping iteration, otherwise, turning to step 4).
Step 9, generating emotion words, namely dividing words in the vocabulary V' into commendation, derogation and neutrality according to the probability in the label matrix Y according to the result of the label propagation algorithm in the step 8; and (5) sorting to obtain an emotion dictionary.
Step 10 is ended.
Fig. 2 is a structural diagram of the word-level supervision model in step 4, and the specific structure and setting thereof are as follows:
1) setting the dimension of a word vector as 100 and the size of a context window as 3; initializing a word vector matrix W, wherein the size of the word vector matrix W is | V | × 100, and the ith row of the word vector matrix W represents a word vector of the ith word in a vocabulary table V;
2) in the input layer, a word w is selected from a text data set DtAs an input, it is represented in the form wtOne-hot of (a);
3) in the projection layer, W is output from a word vector matrix WtVector form C (w)t);
4) In the output layer, C (w) is usedt) Predicted word wtContext of (1 w)t-k,…,wt-1,wt+1,…,wt+k}. The loss function is noted as: losscontext(wt)=-∑-k≤j≤k,j≠0logp(wt+k|wt);
5) In the output layer, C (w) is used by the softmax functiont) Predicted word wtIs marked with emotion
Figure BDA0001663799400000081
The loss function is noted as:
Figure BDA0001663799400000082
6) and (6) ending.
Fig. 3 is a structural diagram of the text-level supervision model in step 5, and the specific structure and setting thereof are as follows:
1) setting the dimension m of the word vector as 100, and initializing a word vector matrix W. Setting the maximum text length input by the model to be L according to the length of the text in the text data set D;
2) in the input layer, a piece of text D is selected from a text data set Di. Extracting text d from word vector matrix WiThe word vectors of each word in the Chinese character are connected with each other to form a two-dimensional matrix with the size of L multiplied by m, and the matrix is made intoInputting a model;
3) in the convolutional layer, performing convolution operation by using 200 filters to obtain a feature vector;
4) in the Pooling layer, extracting the most important features from the feature vectors as output by using Max Pooling Over Time operation;
5) in the fully-connected layer, a fully-connected neural network layer is used, and the input text d is predicted through a softmax functioniIs marked with emotion
Figure BDA0001663799400000083
The loss function is:
Figure BDA0001663799400000084
6) and (6) ending.
In conclusion, the emotion recognition method based on the supervised corpus is used for generating word vectors with emotion embedding by using the neural network, then, the internal relation between words is mined, the emotion labels are spread by using the label spreading algorithm, and the emotion dictionary in the specific field is automatically constructed. The method avoids the defect that the emotion dictionary construction method based on the knowledge base cannot be used for emotion analysis in a specific field, and strengthens the consideration of the complex relation of the words in the text compared with other methods based on the corpus. And finally, automatically constructing the emotion dictionary.
The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.

Claims (5)

1. A method for constructing an emotion dictionary based on supervised emotion text and word vectors is characterized by comprising the following steps:
(1) acquiring a text data set D, wherein the text data set D comprises a positive emotion text with a positive emotion mark and a negative emotion text with a negative emotion mark;
(2) preprocessing texts in the text data set; constructing a vocabulary V, and filling words appearing for the first time in the preprocessed text data set into the vocabulary V one by one;
(3) calculating the emotional tendency value of each word in the vocabulary V by adopting an SO-PMI method, and determining the emotional mark of the corresponding word according to the emotional tendency value:
Figure FDA0003073586730000011
wherein, tablewAn emotional tag representing the word w, SO-PMI (w) representing an emotional tendency value of the word w;
(4) constructing an improved skip-gram model with word level supervision, wherein the improved skip-gram model takes the words in the D as input data and predicts the context and emotion marks of the words; loss function loss in computing a prediction contextcontextAnd loss function loss in predicting emotion markword
losscontextAnd losswordAre respectively:
Figure FDA0003073586730000012
Figure FDA0003073586730000013
wherein, wtMeaning term, wt∈D,
Figure FDA0003073586730000014
Meaning word wtThe sentiment mark of (2); { wt-k,…,wt-1,wt+1,…,wt+kIndicates the predicted set of context words, including the predicted word wtThe first k words and the last k words; p (w)t+j|wt) Watch (A)Word wt+jIs predicted as wtProbability of context of (p (pos | w)t) Denotes wtProbability of being predicted to have positive emotion marker, p (neg | w)t) Denotes wtA probability of being predicted to have a negative sentiment marker;
(5) constructing a convolutional neural network model as a text-level supervision model, wherein the text-level supervision model takes a text in a text data set D as input data and predicts emotion marks of the text; calculating a loss function loss between the predicted emotion mark of the text and the actual emotion mark of the textdoc
Figure FDA0003073586730000021
Wherein d isiRepresenting text, di∈D;
Figure FDA0003073586730000022
Denotes diThe sentiment tag of (1); p (pos | d)i) Denotes diProbability of being predicted to have positive emotion marker, p (neg | d)i) Denotes diA probability of being predicted to have a negative sentiment marker;
(6) setting a joint loss function:
loss=α1·losscontext2·lossdoc3·lossword
in the formula, alpha1、α2、α3Are respectively losscontext、lossdoc、losswordThe weight coefficient of (a);
(7) text data set D and emotion mark table of wordswEmotion marking of text
Figure FDA0003073586730000025
Training a joint loss function by using a back propagation algorithm for inputting data to obtain a word vector with emotion embedding;
(8) constructing a word relation graph G according to the word vector with emotion embedding obtained in the step (7);
(9) selecting partial words in the word relation graph G as seed words, and marking emotion labels for the seed words, wherein the emotion labels comprise commendation, derogation and neutrality; and then, propagating the emotion labels of the seed words in the relational graph G by using a label propagation algorithm to generate an emotion dictionary.
2. The method as claimed in claim 1, wherein the calculation formula of the emotional tendency value is as follows:
Figure FDA0003073586730000023
where SO-PMI (w) represents an emotional tendency value of word w, pos represents positive emotion text, neg represents negative emotion text, p (w | pos) represents a probability that word w appears in the positive emotion text, and p (w | neg) represents a probability that word w appears in the negative emotion text.
3. The method as claimed in claim 2, wherein the modified skip-gram model with word-level supervision comprises an input layer, a projection layer and an output layer, wherein the input layer is a word w in a text data set DtProjection layer will be the word wtProjected as a word vector C (w)t) The output layer is based on C (w)t) Separately predict wtContext and emotion markup of
Figure FDA0003073586730000024
4. The method as claimed in claim 3, wherein the text level supervision model comprises: an input layer, a convolution layer, a pooling layer and a full-link layer, wherein the input layer is a text in the text data set DThis di(ii) a From the text d, the convolutional layer is passed through a feature extractoriExtracting a plurality of feature vectors and sending the feature vectors to a pooling layer; selecting the most important characteristic vector from the characteristic vectors by the Pooling layer through Max Pooling Over Time operation and outputting the most important characteristic vector to the full connection layer; the full-connection layer predicts an input text d through a softmax function according to the received feature vectorsiIs marked with emotion
Figure FDA0003073586730000031
5. The method as claimed in claim 4, wherein the step of constructing the word relationship graph G comprises the following steps:
1) extracting verbs, adjectives and adverbs in the vocabulary V to form a new vocabulary V';
2) constructing a word relation graph G, and taking words in V' as vertexes in G;
3) for each word w in ViCalculating wiAnd (4) selecting m words with the nearest Euclidean distances from all other words in the V' in the word vector space obtained in the step (7), and establishing w in the word relation graph GiAnd the weight calculation formula of the edge between the m words is as follows:
Figure FDA0003073586730000032
wherein, wijThe expression wiAnd wjWeight of edges in between, xi、xjAre respectively a word wiAnd wjWord vector of (1), euclidean _ dis (x)i,xj) Denotes xi、xjThe Euclidean distance between; σ is a constant parameter for controlling wijTaking the value of (A);
then the word wiThe weight of the edge with the other words than the m words is set to 0.
CN201810473308.6A 2018-05-17 2018-05-17 Sentiment dictionary construction method based on supervised sentiment text and word vector Active CN108647191B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810473308.6A CN108647191B (en) 2018-05-17 2018-05-17 Sentiment dictionary construction method based on supervised sentiment text and word vector

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810473308.6A CN108647191B (en) 2018-05-17 2018-05-17 Sentiment dictionary construction method based on supervised sentiment text and word vector

Publications (2)

Publication Number Publication Date
CN108647191A CN108647191A (en) 2018-10-12
CN108647191B true CN108647191B (en) 2021-06-25

Family

ID=63756399

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810473308.6A Active CN108647191B (en) 2018-05-17 2018-05-17 Sentiment dictionary construction method based on supervised sentiment text and word vector

Country Status (1)

Country Link
CN (1) CN108647191B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109885687A (en) * 2018-12-29 2019-06-14 深兰科技(上海)有限公司 A kind of sentiment analysis method, apparatus, electronic equipment and the storage medium of text
CN109902300A (en) * 2018-12-29 2019-06-18 深兰科技(上海)有限公司 A kind of method, apparatus, electronic equipment and storage medium creating dictionary
CN110399595B (en) * 2019-07-31 2024-04-05 腾讯科技(成都)有限公司 Text information labeling method and related device
CN110598207B (en) * 2019-08-14 2020-09-01 华南师范大学 Word vector obtaining method and device and storage medium
CN110717047B (en) * 2019-10-22 2022-06-28 湖南科技大学 Web service classification method based on graph convolution neural network
CN114648015B (en) * 2022-03-15 2022-11-15 北京理工大学 Dependency relationship attention model-based aspect-level emotional word recognition method
CN114822495B (en) * 2022-06-29 2022-10-14 杭州同花顺数据开发有限公司 Acoustic model training method and device and speech synthesis method
CN116304028B (en) * 2023-02-20 2023-10-03 重庆大学 False news detection method based on social emotion resonance and relationship graph convolution network

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663139B (en) * 2012-05-07 2013-04-03 苏州大学 Method and system for constructing emotional dictionary
CN103544246A (en) * 2013-10-10 2014-01-29 清华大学 Method and system for constructing multi-emotion dictionary for internet
CN104317965B (en) * 2014-11-14 2018-04-03 南京理工大学 Sentiment dictionary construction method based on language material
CN107102989B (en) * 2017-05-24 2020-09-29 南京大学 Entity disambiguation method based on word vector and convolutional neural network
CN107451118A (en) * 2017-07-21 2017-12-08 西安电子科技大学 Sentence-level sensibility classification method based on Weakly supervised deep learning
CN107609132B (en) * 2017-09-18 2020-03-20 杭州电子科技大学 Semantic ontology base based Chinese text sentiment analysis method

Also Published As

Publication number Publication date
CN108647191A (en) 2018-10-12

Similar Documents

Publication Publication Date Title
CN108647191B (en) Sentiment dictionary construction method based on supervised sentiment text and word vector
US12056458B2 (en) Translation method and apparatus based on multimodal machine learning, device, and storage medium
CN110245229B (en) Deep learning theme emotion classification method based on data enhancement
CN109902298B (en) Domain knowledge modeling and knowledge level estimation method in self-adaptive learning system
CN108614875B (en) Chinese emotion tendency classification method based on global average pooling convolutional neural network
CN109241255B (en) Intention identification method based on deep learning
CN107168955B (en) Utilize the Chinese word cutting method of the word insertion and neural network of word-based context
Wieting et al. Charagram: Embedding words and sentences via character n-grams
CN108363743B (en) Intelligent problem generation method and device and computer readable storage medium
WO2019153737A1 (en) Comment assessing method, device, equipment and storage medium
CN112560478B (en) Chinese address Roberta-BiLSTM-CRF coupling analysis method using semantic annotation
CN110555084B (en) Remote supervision relation classification method based on PCNN and multi-layer attention
CN107679234A (en) Customer service information providing method, device, electronic equipment, storage medium
CN109977199B (en) Reading understanding method based on attention pooling mechanism
CN110516070B (en) Chinese question classification method based on text error correction and neural network
CN108153864A (en) Method based on neural network generation text snippet
CN113505200B (en) Sentence-level Chinese event detection method combined with document key information
CN112686044B (en) Medical entity zero sample classification method based on language model
CN109284361A (en) A kind of entity abstracting method and system based on deep learning
CN113204967B (en) Resume named entity identification method and system
CN112966525B (en) Law field event extraction method based on pre-training model and convolutional neural network algorithm
CN113128203A (en) Attention mechanism-based relationship extraction method, system, equipment and storage medium
CN111966812A (en) Automatic question answering method based on dynamic word vector and storage medium
CN114943230A (en) Chinese specific field entity linking method fusing common knowledge
CN114417851B (en) Emotion analysis method based on keyword weighted information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant