CN104965822A - Emotion analysis method for Chinese texts based on computer information processing technology - Google Patents

Emotion analysis method for Chinese texts based on computer information processing technology Download PDF

Info

Publication number
CN104965822A
CN104965822A CN201510452024.5A CN201510452024A CN104965822A CN 104965822 A CN104965822 A CN 104965822A CN 201510452024 A CN201510452024 A CN 201510452024A CN 104965822 A CN104965822 A CN 104965822A
Authority
CN
China
Prior art keywords
mrow
msub
comment
chinese
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510452024.5A
Other languages
Chinese (zh)
Other versions
CN104965822B (en
Inventor
高琰
晁绪耀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central South University
Original Assignee
Central South University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central South University filed Critical Central South University
Priority to CN201510452024.5A priority Critical patent/CN104965822B/en
Publication of CN104965822A publication Critical patent/CN104965822A/en
Application granted granted Critical
Publication of CN104965822B publication Critical patent/CN104965822B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The invention discloses an emotion analysis method for Chinese texts based on computer information processing technology. Comments on Chinese products are subjected to word segmentation. By utilizing a bag-of-words model, vector representations of product comments are generated. The vector of every comment is inputted to a visible unit of a limited Boltzmann machine (RBM) in deep learning. Sentimental characteristics of Chinese texts are extracted by the RBM and the extracted emotional characteristics are inputted to a SVM for text emotion classification. The emotion analysis method for the Chinese texts based on computer information processing technology is capable of improving relevance of emotional semantics of characteristics while the SVM is capable of improving accuracy of emotion classification of comments on Chinese products.

Description

Chinese text emotion analysis method based on computer information processing technology
Technical Field
The invention relates to a natural language processing technology, in particular to a Chinese text emotion analysis method based on a computer information processing technology.
Background
With the rapid development of the internet and electronic commerce, people increasingly like to shop on the internet and publish their own opinions and opinions about a certain product on the internet. Emotional analysis is a popular research field in artificial intelligence, and aims to enable a computer to extract the emotional tendency of an author to an evaluation object, namely positive direction or negative direction, from subjective text generated by a user. Positive refers to the user's liking for the product, their recognition and positive attitude for the product; negative refers to a situation where a user has a negative attitude with respect to a product due to the product having a defect or being discontented.
At present, the main methods for analyzing the sentiment of the network commodity comment are a method based on sentiment knowledge and a method based on machine learning.
The method based on the emotional knowledge mainly depends on some existing emotional dictionaries and language knowledge to classify the emotional tendency of the text. For example, Polanyi et al focus on constructing several emotion dictionaries and emotion rules for polar words and related non-component content words; velikovich et al show how to learn a large emotion dictionary using a seed dictionary and generation frame for emotion analysis of text. Lee et al determine the emotional tendency of a text by comparing the number of positive words and negative words, i.e., which of the larger number of words is the emotional tendency of the entire text. The emotion dictionary mainly depends on the formulation of experts and cannot reflect the comprehensiveness of the emotion dictionary.
Methods based on machine learning include supervised and unsupervised learning. Turney et al have proposed a method for performing sentiment analysis on document-level text using unsupervised learning; the Tanghufeng and the like use an n-gram model to extract nouns, verbs, adverbs, adjectives and the like as the characteristics of the text, and also respectively use KNN, Bayes and SVM classifiers to classify the text, and simultaneously train on different scales and different characteristic quantities, so as to identify the emotional tendency analysis of the text; sunyhan et al propose an unsupervised topic emotion mixed model, which collects emotion labels and topic labels of each sentence on the basis of not labeling a text, thereby obtaining emotion words of the text and further realizing classification of document-level emotions. However, product reviews of Chinese texts are oriented to Chinese with multiple meanings and rules, and when automatic emotion analysis is performed on the Chinese texts by machine learning, accuracy is often low due to improper features selected or extracted.
In machine learning-based emotion analysis, feature extraction is an important factor in determining the effect of analysis. Deep learning forms a more abstract class or feature of high-level representation properties by combining low-level features to discover a distributed feature representation of the data. Deep learning consists in building, simulating the human brain for analytical learning neural networks, which mimic the mechanisms of the human brain to interpret data such as images, sounds and text. The RBM is an algorithm for deep learning, has the characteristic that deep learning learns layer by layer to obtain data structure information, and can be used for extracting features representing emotion polarity.
Disclosure of Invention
The invention aims to solve the technical problem of providing a Chinese text sentiment analysis method based on a computer information processing technology aiming at the defects of the prior art.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows: a Chinese text sentiment analysis method based on computer information processing technology comprises the following steps:
1) inputting a Chinese product comment text set, performing Chinese word segmentation by using an accurate mode in the Chinese product result segmentation, determining part of speech, filtering according to the part of speech, keeping the part of speech as nouns, adjectives, adverbs and conjunctions, obtaining a corpus set after word segmentation by Chinese word segmentation, calculating the text segmented into words by Chinese word segmentation to obtain the weight of each word at each comment, and generating a comment vector;
2) feature selection is carried out on the comment vectors by adopting a restricted Boltzmann machine; the restricted Boltzmann machine is composed of a visible layer V and a hidden layer H, node neurons in the visible layer are binary random numbers, nodes of the visible layer are mutually independent, and nodes of the hidden layer are mutually independent; the connection weight between the visible layer node and the hidden layer node is W; inputting the comment vectors generated in the step 1) into a visible unit layer;
3) the restricted Boltzmann machine is trained by a contrast divergence method, relevant parameters of the restricted Boltzmann machine are initialized, and after 100 iterations, a hidden layer node h is obtainedi1 conditional probability p (h) in the visible layer vi1/v) and a visual layer node viConditional probability p (v) under hidden layer h of 1i1/h); wherein, <math><mrow> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>h</mi> <mi>j</mi> </msub> <mo>=</mo> <mn>1</mn> <mo>|</mo> <mi>v</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>&sigma;</mi> <mrow> <mo>(</mo> <msub> <mi>b</mi> <mi>j</mi> </msub> <mo>+</mo> <munder> <mo>&Sigma;</mo> <mi>i</mi> </munder> <msub> <mi>v</mi> <mi>i</mi> </msub> <msub> <mi>w</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo>)</mo> </mrow> <mo>;</mo> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>v</mi> <mi>i</mi> </msub> <mo>=</mo> <mn>1</mn> <mo>|</mo> <mi>h</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>&sigma;</mi> <mrow> <mo>(</mo> <msub> <mi>a</mi> <mi>i</mi> </msub> <mo>+</mo> <munder> <mo>&Sigma;</mo> <mi>i</mi> </munder> <msub> <mi>h</mi> <mi>j</mi> </msub> <msub> <mi>w</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo>)</mo> </mrow> <mo>;</mo> </mrow></math> function(s)Offset aiAnd bjThe initial values are all 0; w is aijIs the connection weight value, w, between the node i of the visible layer and the node j of the hidden layerijObeying a normal distribution with a mean value of 0 and a standard deviation of 1;
4) to visible layer node v according to hidden layeriCarrying out reconstruction; if v isiIs less than or equal to 0.01, the visible node v is considerediFailing to represent higher level structures, discarding viCorresponding words; if v isiIs greater than 0.01, v isiSubtracting the reconstructed value and the input value to obtain a difference value; the maximum emotional feature number of each comment is set to be 30, the difference values are arranged from small to large, and the first 30 areWord viThe difference value of (a) is smaller than that of the following word, indicating that viNo change occurs after reconstruction, and then v is found outiCorresponding Chinese words are taken as characteristic words; the difference between the later word and the previous 30 words is larger, which indicates viAfter reconstruction, if the result is changed greatly, v is convertediCorresponding as a non-characteristic word; if the number of visible nodes with reconstruction values larger than 0.01 in a certain comment is smaller than 30, all the words corresponding to the visible nodes are reserved as feature words; the reserved feature words jointly form a vector space of each comment;
5) inputting the comment vectors newly generated after the processing of the step 4) into a support vector machine for classification to obtain the emotion category of each comment vector; the emotion classification is positive or negative.
In the step 1), the generation process of the comment vector is as follows: and (3) adopting a word bag model in natural language processing, if a certain word appears in a certain comment, setting the corresponding weight value of the word in a comment vector to be 1, otherwise, setting the value to be 0, and setting the length of the comment vector to be the size of a dictionary for extracting the word.
The number of nodes of the visible layer is 476 and the number of nodes of the hidden layer is 50.
Compared with the prior art, the invention has the beneficial effects that: according to the method, a deep learning algorithm RBM is adopted to conduct high-level abstraction on original feature information of Chinese product reviews, the original feature information is reconstructed according to the high-level information, feature selection is conducted according to a reconstruction value, and feature words capable of expressing high-level structure information better are obtained. The method can extract the characteristics which can express the emotional essence of the product comment; the new features obtained by feature extraction by the RBM are the combination of the original features, and feature selection by the RBM is to select the feature words of the comments according to comparison between the reconstructed comments and the original comments. Compared with the characteristic extraction, the selected characteristic has more intuition; feature selection is carried out on the original features of the Chinese product reviews through RBM, and then review vectors generated by feature words are input into SVM for sentiment classification. Experimental results show that the method has high precision.
Drawings
FIG. 1 is a flow chart of the RBM-SVM algorithm;
FIG. 2 is a bag-of-words text vector representation;
FIG. 3 is a word space after RBM reconstruction;
FIG. 4 shows the comparison before and after dimension reduction of a certain comment information feature selection.
Detailed Description
The specific implementation process of the invention is as follows:
step 1: 1706 comment texts in a Chinese comment corpus related to electronic products are extracted from the data hall, wherein 853 positive comments and 853 negative comments exist.
Step 2: the 1706 comment texts are subjected to Chinese word segmentation by using an accurate mode in the Chinese word segmentation, and the part of speech is judged. Stop words are removed while part-of-speech filtering rules are defined. Defining part-of-speech filtering rules is:
a. establishing a part-of-speech set expressing the tendency of emotion: { noun, adjective, adverb, and conjunctive },
b. and judging the part of speech of each word in the comment, deleting the word if the part of speech of a certain word is not in the part of speech set, and otherwise, keeping the word.
And step 3: using the bag of words model in natural language processing, if the word appears in the comment, it is set to 1, otherwise it is set to 0, which is used as the word vector representation of the text. The length of the vector is the size of the dictionary of extracted words. Such as: there are two comments d1 and d2, and the two comments are processed to extract nouns, adjectives, adverbs, conjunctions, etc. in the two comments, which are marked as w1, w2, w3, w4 and w5 respectively. The word vectors represented by the bag of words model are as follows:
and 4, step 4: a two-layer Restricted Boltzmann Machine (RBM) is established, the number of nodes of the visible layer is 476, and the number of nodes of the hidden layer is 50. The individual visible cells within the visible layer are independent of each other. The node i of the visible layer is connected with the node j of the hidden layer, and the connection weight value is Wji. The confined boltzmann machine has a joint configuration of energy between the visible layer and the hidden layer, which can be expressed as:
<math><mrow> <mi>E</mi> <mrow> <mo>(</mo> <mi>v</mi> <mo>,</mo> <mi>h</mi> <mo>;</mo> <mi>&theta;</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>-</mo> <munder> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </munder> <msub> <mi>w</mi> <mrow> <mi>j</mi> <mi>i</mi> </mrow> </msub> <msub> <mi>v</mi> <mi>i</mi> </msub> <msub> <mi>h</mi> <mi>j</mi> </msub> <mo>-</mo> <munder> <mo>&Sigma;</mo> <mi>i</mi> </munder> <msub> <mi>b</mi> <mi>i</mi> </msub> <msub> <mi>v</mi> <mi>i</mi> </msub> <mo>-</mo> <munder> <mo>&Sigma;</mo> <mi>j</mi> </munder> <msub> <mi>a</mi> <mi>j</mi> </msub> <msub> <mi>h</mi> <mi>j</mi> </msub> </mrow></math>
where i is the number of nodes of the visible layer unit, j is the number of nodes of the hidden layer unit, and θ ═ wij,ai,bjIs the model parameter, wijIs the connection weight value, a, between the node i of the visible layer and the node j of the hidden layeriAnd biThe offsets of the visible layer and the hidden layer, respectively. Their joint probability distribution is:
<math><mfenced open = '' close = ''> <mtable> <mtr> <mtd> <mrow> <msub> <mi>p</mi> <mi>&theta;</mi> </msub> <mrow> <mo>(</mo> <mi>v</mi> <mo>,</mo> <mi>h</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mi>z</mi> <mrow> <mo>(</mo> <mi>&theta;</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mi>exp</mi> <mrow> <mo>(</mo> <mo>-</mo> <mi>E</mi> <mo>(</mo> <mrow> <mi>v</mi> <mo>,</mo> <mi>h</mi> <mo>;</mo> <mi>&theta;</mi> </mrow> <mo>)</mo> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mi>z</mi> <mrow> <mo>(</mo> <mi>&theta;</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <munder> <mi>&Pi;</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </munder> <msup> <mi>e</mi> <mrow> <msub> <mi>w</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <msub> <mi>v</mi> <mi>i</mi> </msub> <msub> <mi>h</mi> <mi>j</mi> </msub> </mrow> </msup> <munder> <mi>&Pi;</mi> <mi>i</mi> </munder> <msup> <mi>e</mi> <mrow> <msub> <mi>b</mi> <mi>i</mi> </msub> <msub> <mi>v</mi> <mi>i</mi> </msub> </mrow> </msup> <munder> <mi>&Pi;</mi> <mi>j</mi> </munder> <msup> <mi>e</mi> <mrow> <msub> <mi>a</mi> <mi>j</mi> </msub> <msub> <mi>h</mi> <mi>j</mi> </msub> </mrow> </msup> </mrow> </mtd> </mtr> </mtable> </mfenced></math>
whereinThe joint probability is to find the sample probability of the hidden layer and the visible layer.
Because in RBM there is no direct connection between hidden layer units, i.e. hidden layer nodes are conditionally independent, i.e.The data expectation of an unbiased sample of the data distribution is readily obtained. Given a randomly selected visible layer v, hidden layer element hjThe probability of a binary state of 1 is:
<math><mrow> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>h</mi> <mi>j</mi> </msub> <mo>=</mo> <mn>1</mn> <mo>|</mo> <mi>v</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>&sigma;</mi> <mrow> <mo>(</mo> <msub> <mi>b</mi> <mi>j</mi> </msub> <mo>+</mo> <munder> <mo>&Sigma;</mo> <mi>i</mi> </munder> <msub> <mi>v</mi> <mi>i</mi> </msub> <msub> <mi>w</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>3</mn> <mo>-</mo> <mn>3</mn> <mo>)</mo> </mrow> </mrow></math>
wherein the function
<math><mrow> <mi>&sigma;</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mn>1</mn> <mo>+</mo> <mi>exp</mi> <mrow> <mo>(</mo> <mo>-</mo> <mi>x</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>3</mn> <mo>-</mo> <mn>4</mn> <mo>)</mo> </mrow> </mrow></math>
At this time vi,hjIs an unbiased sample, and this function, like the function in the neural network, is a conditional bernoulli distribution. Since there is also no direct connection between visible layer cells, it is easy to get a sample of visible cells, and the probability of a binary state of 1 at a given visible layer cell is:
<math><mrow> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>v</mi> <mi>i</mi> </msub> <mo>=</mo> <mn>1</mn> <mo>|</mo> <mi>h</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>&sigma;</mi> <mrow> <mo>(</mo> <msub> <mi>a</mi> <mi>i</mi> </msub> <mo>+</mo> <munder> <mo>&Sigma;</mo> <mi>i</mi> </munder> <msub> <mi>h</mi> <mi>j</mi> </msub> <msub> <mi>w</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo>)</mo> </mrow> </mrow></math>
and 5: and inputting the segmented Chinese comments into the RBM, wherein each word in the dictionary corresponds to one node of the visible layer. And (5) adopting RBM to perform unsupervised learning.
Step 6: RBM rapidly and unsupervised training is performed by a method called Contrast Divergence (CD). the Contrast Divergence (CD) in RBM is subjected to k-step Gibbs sampling, i.e. learning is performed by using k-CDgibbs. In the k-CDgibbs algorithm, a training sample or a random initial state of a visual node is used for initializing the state v of the visual node0Alternately, the following sampling is performed, where P (h | v)0) Is h0Wherein P (v | h)0) Is v is1And the rest are similar.
h0~p(h|v0),v1~p(v|h0) (3-13)
h1~p(h|v1),v2~p(v|h1) (3-14)
......,vk+1~p(v|hk) (3-15)
Assuming S as the training set, (v)1,……,vm,h1,……hn) Respectively, a visible node and a hidden node of the RBM. The specific implementation process of K-CD Gibbs is as follows, and parameters in the process are described as follows: Δ wijIs WijIncrement of weight, Δ ajIs defined as offset ajIncrement of, Δ biIs an offset biIncrement of (2)
And 7: initializing parameters of a Restricted Boltzmann Machine (RBM): aiming at the step 4, the weight W in the RBM is subjected to normal distribution with the mean value of 0 and the standard deviation of 1; offset ajAnd bjThe initial values are all 0, respectively. The learning rate was 0.01
Step 8, setting the iteration times as 100, and finally obtaining the weight W between the visible node i and the hidden node j by the CD-k gibbs algorithmijOffset a, offset ajAnd biThe final value of (c). And p (v) is calculated with reference to the formula listed in step 4i=1|h)
And step 9: word space per comment according to p (v)i1| h) for reconstruction. If v is to beiIs less than or equal to 0.01, the visible node v is considered to beiFailing to represent higher level structures, discarding viThe corresponding words. If v isiIs greater than 0.01, v isiThe difference value is obtained by subtracting the sum input value from the reconstructed value. If the corresponding difference value is smaller, the word is not changed after reconstruction, then what the Chinese word corresponding to the word is found out, and then the word can be used as a characteristic representation; on the contrary, if the corresponding difference is larger, the word is greatly changed after being reconstructed, so that the word cannot be used as the characteristic representation of the comment. Therefore, according to the principle, the difference values are arranged from small to large, and the first 30 are selected as feature words. If the number of visible nodes with reconstruction values larger than 0.01 in a certain comment is smaller than 30, all the words corresponding to the visible nodes are reserved as the characteristic words. The word vector space of the comment is reduced through feature selection.
Step 10. the number of Gibbs samples K we selected experimentally. In the experiment, 1706 comment texts were randomly selected according to the training set and the test set in a ratio of 3: 1. The node of the hidden layer of the RBM is set to be 2, and is represented by 0 and 1, wherein 0 represents that the judged emotion of the product comment is negative at the moment, namely, the attitude that the user holds the product negatively, 1 represents that the judged emotion of the product comment is positive at the moment, namely, the attitude that the user holds the product positively, the RBM is used as a cluster, and the emotion types of the comment are output, wherein K is set to be 5,10,15,20 and 25 respectively. With the change of K, RBM measures different recognition results of product reviews by 3 indexes of accuracy, recall rate and F mean.
TABLE 1 different recognition results of RBM on product reviews
The experimental result shows that when K is 10, the F mean value is the highest, which shows that the comprehensive performance is the best, so K is 10.
Step 11: the comments subjected to dimension reduction are input into a Support Vector Machine (SVM) for emotion classification. The learning strategy of the linear classifier with the maximum interval on the SVM feature space is to maximize the interval and finally convert the interval into the solution of a convex quadratic programming problem. The objective function of the SVM is as follows:
<math><mrow> <mi>max</mi> <mi>W</mi> <mrow> <mo>(</mo> <mi>&alpha;</mi> <mo>)</mo> </mrow> <mo>=</mo> <msubsup> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </msubsup> <msub> <mi>&alpha;</mi> <mi>i</mi> </msub> <mo>-</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <msubsup> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </msubsup> <msub> <mi>&alpha;</mi> <mi>i</mi> </msub> <msub> <mi>&alpha;</mi> <mi>j</mi> </msub> <msub> <mi>y</mi> <mi>i</mi> </msub> <msub> <mi>y</mi> <mi>j</mi> </msub> <mi>K</mi> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>x</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow></math>
<math><mrow> <mi>s</mi> <mo>.</mo> <mi>t</mi> <mo>.</mo> <msubsup> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </msubsup> <msub> <mi>&alpha;</mi> <mi>i</mi> </msub> <msub> <mi>y</mi> <mi>i</mi> </msub> <mo>=</mo> <mn>0</mn> <mo>;</mo> <mi>C</mi> <mo>&GreaterEqual;</mo> <mi>&alpha;</mi> <mo>></mo> <mo>></mo> <mn>0</mn> <mo>,</mo> <mi>i</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mo>...</mo> <mo>.</mo> </mrow></math>
solving the objective function to obtain:
according to <math><mrow> <msup> <mi>&alpha;</mi> <mo>*</mo> </msup> <mo>=</mo> <mrow> <mo>(</mo> <msubsup> <mi>&alpha;</mi> <mn>1</mn> <mo>*</mo> </msubsup> <mo>,</mo> <mo>...</mo> <mo>...</mo> <msubsup> <mi>&alpha;</mi> <mn>1</mn> <mo>*</mo> </msubsup> <mo>)</mo> </mrow> <mo>,</mo> </mrow></math> Computing <math><mrow> <msup> <mi>w</mi> <mo>*</mo> </msup> <mo>=</mo> <msubsup> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>l</mi> </msubsup> <msub> <mi>y</mi> <mi>i</mi> </msub> <msubsup> <mi>&alpha;</mi> <mi>i</mi> <mo>*</mo> </msubsup> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>,</mo> </mrow></math> Selection of alpha*A component ofSatisfy the requirement of <math><mrow> <mn>0</mn> <mo><</mo> <msubsup> <mi>&alpha;</mi> <mi>j</mi> <mo>*</mo> </msubsup> <mo><</mo> <mi>C</mi> <mo>,</mo> </mrow></math>
And calculate out according to the above <math><mrow> <msup> <mi>b</mi> <mo>*</mo> </msup> <mo>=</mo> <msub> <mi>y</mi> <mi>j</mi> </msub> <mo>-</mo> <msubsup> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>l</mi> </msubsup> <msub> <mi>y</mi> <mi>i</mi> </msub> <msubsup> <mi>&alpha;</mi> <mi>i</mi> <mo>*</mo> </msubsup> <mi>K</mi> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>x</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow></math>
Wherein α ═ { α ═ α1……αN};K(xi,xj) For Chinese comments xiAnd xjKernel function of yiFor Chinese comments xiC is a penalty factor, and we choose the value of C to be 0.0001 in the experiment.
In SVM, Chinese comment x1And x2The kernel function we use is the radial basis function RBF, see the following equation. And mapping the feature space from a low-dimensional space to a high-dimensional space through a kernel function.
<math><mrow> <mi>K</mi> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mn>1</mn> </msub> <mo>,</mo> <msub> <mi>x</mi> <mn>2</mn> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mi>exp</mi> <mrow> <mo>(</mo> <mo>-</mo> <mfrac> <mrow> <mo>|</mo> <mo>|</mo> <msub> <mi>x</mi> <mn>1</mn> </msub> <mo>-</mo> <msub> <mi>x</mi> <mn>2</mn> </msub> <mo>|</mo> <msup> <mo>|</mo> <mn>2</mn> </msup> </mrow> <mrow> <mn>2</mn> <msup> <mi>&sigma;</mi> <mn>2</mn> </msup> </mrow> </mfrac> <mo>)</mo> </mrow> </mrow></math>
Step 12: comparing the emotion analysis method combining the RBM and the SVM with a method of singly using the SVM or the RBM, and calculating 3 indexes of accuracy, recall rate and F mean value according to the output result and the result of manual marking. The results of the experiment are shown in the following table:
TABLE 2 identification results of different methods
As seen from Table 2, the combination of RBM and SVM greatly improves the emotion analysis accuracy. Compared with the single use of the SVM, the combination of the RBM and the SVM has the advantages of improving the accuracy and the recall rate. Compared with the RBM used alone, the combination accuracy of the RBM and the SVM is greatly improved, although the recall rate reduces the point, the F mean value is much higher than that of the RBM used alone, and the comprehensive performance is good in the emotion analysis method of the combination of the RBM and the SVM.

Claims (3)

1. A Chinese text sentiment analysis method based on a computer information processing technology is characterized by comprising the following steps:
1) inputting a Chinese product comment text set, performing Chinese word segmentation by using an accurate mode in the Chinese product result segmentation, determining part of speech, filtering according to the part of speech, keeping the part of speech as nouns, adjectives, adverbs and conjunctions, obtaining a corpus set after word segmentation by Chinese word segmentation, calculating the text segmented into words by Chinese word segmentation to obtain the weight of each word at each comment, and generating a comment vector;
2) feature selection is carried out on the comment vectors by adopting a restricted Boltzmann machine; the restricted Boltzmann machine is composed of a visible layer V and a hidden layer H, node neurons in the visible layer are binary random numbers, nodes of the visible layer are mutually independent, and nodes of the hidden layer are mutually independent; the connection weight between the visible layer node and the hidden layer node is W; inputting the comment vectors generated in the step 1) into a visible unit layer;
3) the restricted Boltzmann machine is trained by a contrast divergence method, relevant parameters of the restricted Boltzmann machine are initialized, and after 100 iterations, a hidden layer node h is obtainedi1 conditional probability p (h) in the visible layer vj1/v) and a visual layer node viConditional probability p (v) under hidden layer h of 1i1/h); wherein, <math> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>h</mi> <mi>j</mi> </msub> <mo>=</mo> <mn>1</mn> <mo>|</mo> <mi>v</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>&sigma;</mi> <mrow> <mo>(</mo> <msub> <mi>b</mi> <mi>j</mi> </msub> <mo>+</mo> <munder> <mo>&Sigma;</mo> <mi>i</mi> </munder> <msub> <mi>v</mi> <mi>r</mi> </msub> <msub> <mi>w</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math> <math> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>v</mi> <mi>i</mi> </msub> <mo>=</mo> <mn>1</mn> <mo>|</mo> <mi>h</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>&sigma;</mi> <mrow> <mo>(</mo> <msub> <mi>a</mi> <mi>i</mi> </msub> <mo>+</mo> <munder> <mo>&Sigma;</mo> <mi>i</mi> </munder> <msub> <mi>h</mi> <mi>j</mi> </msub> <msub> <mi>w</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math> function(s)Offset aiAnd bjThe initial values are all 0; w is aijIs the connection weight value, w, between the node i of the visible layer and the node j of the hidden layerijObeying a normal distribution with a mean value of 0 and a standard deviation of 1;
4) according to the visible layer node viConditional probability p (v) under hidden layer h of 1i1/h) to a visible layer node viCarrying out reconstruction; if v isiIs less than or equal to 0.01, the visible node v is considerediFailing to represent higher level structures, discarding viCorresponding words; if v isiIs greater than 0.01, v isiSubtracting the reconstructed value and the input value to obtain a difference value; setting the maximum emotional feature number of each comment as 30, and arranging the difference values from small to large, wherein the first 30 words viThe difference value of (a) is smaller than that of the following word, indicating that viNo change occurs after reconstruction, and then v is found outiCorresponding Chinese words are taken as characteristic words; the difference between the later word and the previous 30 words is larger, which indicates viAfter reconstruction, if the result is changed greatly, v is convertediCorresponding as a non-characteristic word; if the number of visible nodes with reconstruction values larger than 0.01 in a certain comment is smaller than 30, all words corresponding to the visible nodes are reserved as feature words; the reserved feature words jointly form a vector space of each comment;
5) inputting the comment vectors newly generated after the processing of the step 4) into a support vector machine for classification to obtain the emotion category of each comment vector; the emotion classification is positive or negative.
2. The method for analyzing Chinese text sentiment based on computer information processing technology according to claim 1, wherein in the step 1), the comment vector is generated by: and (3) adopting a word bag model in natural language processing, if a certain word appears in a certain comment, setting the corresponding weight value of the word in a comment vector to be 1, otherwise, setting the value to be 0, and setting the length of the comment vector to be the size of a dictionary for extracting the word.
3. The method of emotion analysis for chinese text based on computer information processing technology as claimed in claim 2, wherein the number of nodes in the visible layer is 476, and the number of nodes in the hidden layer is 50.
CN201510452024.5A 2015-07-29 2015-07-29 A kind of Chinese text sentiment analysis method based on Computerized Information Processing Tech Expired - Fee Related CN104965822B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510452024.5A CN104965822B (en) 2015-07-29 2015-07-29 A kind of Chinese text sentiment analysis method based on Computerized Information Processing Tech

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510452024.5A CN104965822B (en) 2015-07-29 2015-07-29 A kind of Chinese text sentiment analysis method based on Computerized Information Processing Tech

Publications (2)

Publication Number Publication Date
CN104965822A true CN104965822A (en) 2015-10-07
CN104965822B CN104965822B (en) 2017-08-25

Family

ID=54219858

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510452024.5A Expired - Fee Related CN104965822B (en) 2015-07-29 2015-07-29 A kind of Chinese text sentiment analysis method based on Computerized Information Processing Tech

Country Status (1)

Country Link
CN (1) CN104965822B (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105589941A (en) * 2015-12-15 2016-05-18 北京百分点信息科技有限公司 Emotional information detection method and apparatus for web text
CN105930368A (en) * 2016-04-13 2016-09-07 深圳大学 Emotion classification method and system
CN106126502A (en) * 2016-07-07 2016-11-16 四川长虹电器股份有限公司 A kind of emotional semantic classification system and method based on support vector machine
CN106446117A (en) * 2016-09-18 2017-02-22 西安电子科技大学 Text analysis method based on poisson-gamma belief network
CN106502394A (en) * 2016-10-18 2017-03-15 哈尔滨工业大学深圳研究生院 Term vector computational methods and device based on EEG signals
CN106547740A (en) * 2016-11-24 2017-03-29 四川无声信息技术有限公司 Text message processing method and device
CN106844632A (en) * 2017-01-20 2017-06-13 清华大学 Based on the product review sensibility classification method and device that improve SVMs
CN106919673A (en) * 2017-02-21 2017-07-04 浙江工商大学 Text mood analysis system based on deep learning
CN106933847A (en) * 2015-12-30 2017-07-07 北京神州泰岳软件股份有限公司 A kind of method and apparatus for setting up data classification model
CN107193801A (en) * 2017-05-21 2017-09-22 北京工业大学 A kind of short text characteristic optimization and sentiment analysis method based on depth belief network
CN107608956A (en) * 2017-09-05 2018-01-19 广东石油化工学院 A kind of reader's mood forecast of distribution algorithm based on CNN GRNN
WO2018028077A1 (en) * 2016-08-11 2018-02-15 中兴通讯股份有限公司 Deep learning based method and device for chinese semantics analysis
CN107918487A (en) * 2017-10-20 2018-04-17 南京邮电大学 A kind of method that Chinese emotion word is identified based on skin electrical signal
WO2018082569A1 (en) * 2016-11-04 2018-05-11 华为技术有限公司 Sequence conversion method and device
WO2018188567A1 (en) * 2017-04-13 2018-10-18 腾讯科技(深圳)有限公司 Server information pushing method and device, terminal information sending method and device, information pushing system and storage medium
CN109464147A (en) * 2019-01-09 2019-03-15 浙江强脑科技有限公司 Breath signal detection method, device and computer readable storage medium
CN109783806A (en) * 2018-12-21 2019-05-21 众安信息技术服务有限公司 A kind of text matching technique using semantic analytic structure

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009017464A1 (en) * 2007-07-31 2009-02-05 Agency For Science, Technology And Research Relation extraction system
CN104331706A (en) * 2014-10-29 2015-02-04 西安电子科技大学 Polarization SAR image classification based on RBM and SVM

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009017464A1 (en) * 2007-07-31 2009-02-05 Agency For Science, Technology And Research Relation extraction system
CN104331706A (en) * 2014-10-29 2015-02-04 西安电子科技大学 Polarization SAR image classification based on RBM and SVM

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
RONAN COLLOBERT ET AL.: "A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning", 《PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING》 *
任远远 等: "中文网络评论的产品特征提取及情感倾向判定", 《计算机系统应用》 *
张春霞 等: "受限波尔兹曼机", 《工程数学学报》 *

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105589941A (en) * 2015-12-15 2016-05-18 北京百分点信息科技有限公司 Emotional information detection method and apparatus for web text
CN106933847B (en) * 2015-12-30 2019-12-27 北京神州泰岳软件股份有限公司 Method and device for establishing data classification model
CN106933847A (en) * 2015-12-30 2017-07-07 北京神州泰岳软件股份有限公司 A kind of method and apparatus for setting up data classification model
CN105930368A (en) * 2016-04-13 2016-09-07 深圳大学 Emotion classification method and system
CN105930368B (en) * 2016-04-13 2019-07-30 深圳大学 A kind of sensibility classification method and system
CN106126502A (en) * 2016-07-07 2016-11-16 四川长虹电器股份有限公司 A kind of emotional semantic classification system and method based on support vector machine
CN106126502B (en) * 2016-07-07 2018-10-30 四川长虹电器股份有限公司 A kind of emotional semantic classification system and method based on support vector machines
WO2018028077A1 (en) * 2016-08-11 2018-02-15 中兴通讯股份有限公司 Deep learning based method and device for chinese semantics analysis
CN106446117A (en) * 2016-09-18 2017-02-22 西安电子科技大学 Text analysis method based on poisson-gamma belief network
CN106502394A (en) * 2016-10-18 2017-03-15 哈尔滨工业大学深圳研究生院 Term vector computational methods and device based on EEG signals
CN106502394B (en) * 2016-10-18 2019-06-25 哈尔滨工业大学深圳研究生院 Term vector calculation method and device based on EEG signals
CN108021549B (en) * 2016-11-04 2019-08-13 华为技术有限公司 Sequence conversion method and device
WO2018082569A1 (en) * 2016-11-04 2018-05-11 华为技术有限公司 Sequence conversion method and device
CN108021549A (en) * 2016-11-04 2018-05-11 华为技术有限公司 Sequence conversion method and device
US11132516B2 (en) 2016-11-04 2021-09-28 Huawei Technologies Co., Ltd. Sequence translation probability adjustment
CN106547740A (en) * 2016-11-24 2017-03-29 四川无声信息技术有限公司 Text message processing method and device
CN106844632A (en) * 2017-01-20 2017-06-13 清华大学 Based on the product review sensibility classification method and device that improve SVMs
CN106844632B (en) * 2017-01-20 2020-02-21 清华大学 Product comment emotion classification method and device based on improved support vector machine
CN106919673A (en) * 2017-02-21 2017-07-04 浙江工商大学 Text mood analysis system based on deep learning
CN106919673B (en) * 2017-02-21 2019-08-20 浙江工商大学 Text mood analysis system based on deep learning
US10924572B2 (en) 2017-04-13 2021-02-16 Tencent Technology (Shenzhen) Company Limited Information push method and apparatus, information sending method and apparatus, system, and storage medium
WO2018188567A1 (en) * 2017-04-13 2018-10-18 腾讯科技(深圳)有限公司 Server information pushing method and device, terminal information sending method and device, information pushing system and storage medium
CN107193801A (en) * 2017-05-21 2017-09-22 北京工业大学 A kind of short text characteristic optimization and sentiment analysis method based on depth belief network
CN107193801B (en) * 2017-05-21 2020-07-03 北京工业大学 Short text feature optimization and emotion analysis method based on deep belief network
CN107608956A (en) * 2017-09-05 2018-01-19 广东石油化工学院 A kind of reader's mood forecast of distribution algorithm based on CNN GRNN
CN107608956B (en) * 2017-09-05 2021-02-19 广东石油化工学院 Reader emotion distribution prediction algorithm based on CNN-GRNN
CN107918487A (en) * 2017-10-20 2018-04-17 南京邮电大学 A kind of method that Chinese emotion word is identified based on skin electrical signal
CN109783806A (en) * 2018-12-21 2019-05-21 众安信息技术服务有限公司 A kind of text matching technique using semantic analytic structure
CN109783806B (en) * 2018-12-21 2023-05-02 众安信息技术服务有限公司 Text matching method utilizing semantic parsing structure
CN109464147A (en) * 2019-01-09 2019-03-15 浙江强脑科技有限公司 Breath signal detection method, device and computer readable storage medium

Also Published As

Publication number Publication date
CN104965822B (en) 2017-08-25

Similar Documents

Publication Publication Date Title
CN104965822B (en) A kind of Chinese text sentiment analysis method based on Computerized Information Processing Tech
Socher et al. Semi-supervised recursive autoencoders for predicting sentiment distributions
CN104217226B (en) Conversation activity recognition methods based on deep neural network Yu condition random field
CN110321563B (en) Text emotion analysis method based on hybrid supervision model
Chang et al. Research on detection methods based on Doc2vec abnormal comments
US20130179169A1 (en) Chinese text readability assessing system and method
CN110347836B (en) Method for classifying sentiments of Chinese-Yue-bilingual news by blending into viewpoint sentence characteristics
CN112001187A (en) Emotion classification system based on Chinese syntax and graph convolution neural network
CN112364638B (en) Personality identification method based on social text
CN112256866B (en) Text fine-grained emotion analysis algorithm based on deep learning
CN109101490B (en) Factual implicit emotion recognition method and system based on fusion feature representation
CN110717330A (en) Word-sentence level short text classification method based on deep learning
Azim et al. Text to emotion extraction using supervised machine learning techniques
Sboev et al. Deep learning network models to categorize texts according to author's gender and to identify text sentiment
Alroobaea Sentiment analysis on amazon product reviews using the recurrent neural network (rnn)
Nassiri et al. Arabic L2 readability assessment: Dimensionality reduction study
CN111813939A (en) Text classification method based on representation enhancement and fusion
CN115146031B (en) Short text position detection method based on deep learning and auxiliary features
CN107491490B (en) Text emotion classification method based on emotion center
CN113190681B (en) Fine granularity text classification method based on capsule network mask memory attention
CN115906824A (en) Text fine-grained emotion analysis method, system, medium and computing equipment
Rustamov et al. An application of hidden Markov models in subjectivity analysis
Imad et al. Automated Arabic News Classification using the Convolutional Neural Network.
CN115269833A (en) Event information extraction method and system based on deep semantics and multitask learning
Ghelani A review on student automatic grading system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170825

Termination date: 20210729