CN104965822A - Emotion analysis method for Chinese texts based on computer information processing technology - Google Patents
Emotion analysis method for Chinese texts based on computer information processing technology Download PDFInfo
- Publication number
- CN104965822A CN104965822A CN201510452024.5A CN201510452024A CN104965822A CN 104965822 A CN104965822 A CN 104965822A CN 201510452024 A CN201510452024 A CN 201510452024A CN 104965822 A CN104965822 A CN 104965822A
- Authority
- CN
- China
- Prior art keywords
- mrow
- msub
- comment
- chinese
- word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000008451 emotion Effects 0.000 title claims abstract description 33
- 238000004458 analytical method Methods 0.000 title claims abstract description 19
- 230000010365 information processing Effects 0.000 title claims abstract description 10
- 239000013598 vector Substances 0.000 claims abstract description 26
- 230000002996 emotional effect Effects 0.000 claims abstract description 13
- 230000011218 segmentation Effects 0.000 claims abstract description 13
- 238000000034 method Methods 0.000 claims description 20
- 238000012706 support-vector machine Methods 0.000 claims description 16
- 230000006870 function Effects 0.000 claims description 11
- 238000001914 filtration Methods 0.000 claims description 4
- 238000003058 natural language processing Methods 0.000 claims description 4
- 230000000007 visual effect Effects 0.000 claims description 4
- 230000008859 change Effects 0.000 claims description 3
- 210000002569 neuron Anatomy 0.000 claims description 2
- 238000012545 processing Methods 0.000 claims description 2
- 238000013135 deep learning Methods 0.000 abstract description 6
- 238000012552 review Methods 0.000 description 6
- 210000004027 cell Anatomy 0.000 description 4
- 238000010801 machine learning Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000012549 training Methods 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 210000004556 brain Anatomy 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000003278 mimic effect Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Landscapes
- Machine Translation (AREA)
Abstract
The invention discloses an emotion analysis method for Chinese texts based on computer information processing technology. Comments on Chinese products are subjected to word segmentation. By utilizing a bag-of-words model, vector representations of product comments are generated. The vector of every comment is inputted to a visible unit of a limited Boltzmann machine (RBM) in deep learning. Sentimental characteristics of Chinese texts are extracted by the RBM and the extracted emotional characteristics are inputted to a SVM for text emotion classification. The emotion analysis method for the Chinese texts based on computer information processing technology is capable of improving relevance of emotional semantics of characteristics while the SVM is capable of improving accuracy of emotion classification of comments on Chinese products.
Description
Technical Field
The invention relates to a natural language processing technology, in particular to a Chinese text emotion analysis method based on a computer information processing technology.
Background
With the rapid development of the internet and electronic commerce, people increasingly like to shop on the internet and publish their own opinions and opinions about a certain product on the internet. Emotional analysis is a popular research field in artificial intelligence, and aims to enable a computer to extract the emotional tendency of an author to an evaluation object, namely positive direction or negative direction, from subjective text generated by a user. Positive refers to the user's liking for the product, their recognition and positive attitude for the product; negative refers to a situation where a user has a negative attitude with respect to a product due to the product having a defect or being discontented.
At present, the main methods for analyzing the sentiment of the network commodity comment are a method based on sentiment knowledge and a method based on machine learning.
The method based on the emotional knowledge mainly depends on some existing emotional dictionaries and language knowledge to classify the emotional tendency of the text. For example, Polanyi et al focus on constructing several emotion dictionaries and emotion rules for polar words and related non-component content words; velikovich et al show how to learn a large emotion dictionary using a seed dictionary and generation frame for emotion analysis of text. Lee et al determine the emotional tendency of a text by comparing the number of positive words and negative words, i.e., which of the larger number of words is the emotional tendency of the entire text. The emotion dictionary mainly depends on the formulation of experts and cannot reflect the comprehensiveness of the emotion dictionary.
Methods based on machine learning include supervised and unsupervised learning. Turney et al have proposed a method for performing sentiment analysis on document-level text using unsupervised learning; the Tanghufeng and the like use an n-gram model to extract nouns, verbs, adverbs, adjectives and the like as the characteristics of the text, and also respectively use KNN, Bayes and SVM classifiers to classify the text, and simultaneously train on different scales and different characteristic quantities, so as to identify the emotional tendency analysis of the text; sunyhan et al propose an unsupervised topic emotion mixed model, which collects emotion labels and topic labels of each sentence on the basis of not labeling a text, thereby obtaining emotion words of the text and further realizing classification of document-level emotions. However, product reviews of Chinese texts are oriented to Chinese with multiple meanings and rules, and when automatic emotion analysis is performed on the Chinese texts by machine learning, accuracy is often low due to improper features selected or extracted.
In machine learning-based emotion analysis, feature extraction is an important factor in determining the effect of analysis. Deep learning forms a more abstract class or feature of high-level representation properties by combining low-level features to discover a distributed feature representation of the data. Deep learning consists in building, simulating the human brain for analytical learning neural networks, which mimic the mechanisms of the human brain to interpret data such as images, sounds and text. The RBM is an algorithm for deep learning, has the characteristic that deep learning learns layer by layer to obtain data structure information, and can be used for extracting features representing emotion polarity.
Disclosure of Invention
The invention aims to solve the technical problem of providing a Chinese text sentiment analysis method based on a computer information processing technology aiming at the defects of the prior art.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows: a Chinese text sentiment analysis method based on computer information processing technology comprises the following steps:
1) inputting a Chinese product comment text set, performing Chinese word segmentation by using an accurate mode in the Chinese product result segmentation, determining part of speech, filtering according to the part of speech, keeping the part of speech as nouns, adjectives, adverbs and conjunctions, obtaining a corpus set after word segmentation by Chinese word segmentation, calculating the text segmented into words by Chinese word segmentation to obtain the weight of each word at each comment, and generating a comment vector;
2) feature selection is carried out on the comment vectors by adopting a restricted Boltzmann machine; the restricted Boltzmann machine is composed of a visible layer V and a hidden layer H, node neurons in the visible layer are binary random numbers, nodes of the visible layer are mutually independent, and nodes of the hidden layer are mutually independent; the connection weight between the visible layer node and the hidden layer node is W; inputting the comment vectors generated in the step 1) into a visible unit layer;
3) the restricted Boltzmann machine is trained by a contrast divergence method, relevant parameters of the restricted Boltzmann machine are initialized, and after 100 iterations, a hidden layer node h is obtainedi1 conditional probability p (h) in the visible layer vi1/v) and a visual layer node viConditional probability p (v) under hidden layer h of 1i1/h); wherein, <math><mrow>
<mi>p</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>h</mi>
<mi>j</mi>
</msub>
<mo>=</mo>
<mn>1</mn>
<mo>|</mo>
<mi>v</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mi>σ</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>b</mi>
<mi>j</mi>
</msub>
<mo>+</mo>
<munder>
<mo>Σ</mo>
<mi>i</mi>
</munder>
<msub>
<mi>v</mi>
<mi>i</mi>
</msub>
<msub>
<mi>w</mi>
<mrow>
<mi>i</mi>
<mi>j</mi>
</mrow>
</msub>
<mo>)</mo>
</mrow>
<mo>;</mo>
<mi>p</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>v</mi>
<mi>i</mi>
</msub>
<mo>=</mo>
<mn>1</mn>
<mo>|</mo>
<mi>h</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mi>σ</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>a</mi>
<mi>i</mi>
</msub>
<mo>+</mo>
<munder>
<mo>Σ</mo>
<mi>i</mi>
</munder>
<msub>
<mi>h</mi>
<mi>j</mi>
</msub>
<msub>
<mi>w</mi>
<mrow>
<mi>i</mi>
<mi>j</mi>
</mrow>
</msub>
<mo>)</mo>
</mrow>
<mo>;</mo>
</mrow></math> function(s)Offset aiAnd bjThe initial values are all 0; w is aijIs the connection weight value, w, between the node i of the visible layer and the node j of the hidden layerijObeying a normal distribution with a mean value of 0 and a standard deviation of 1;
4) to visible layer node v according to hidden layeriCarrying out reconstruction; if v isiIs less than or equal to 0.01, the visible node v is considerediFailing to represent higher level structures, discarding viCorresponding words; if v isiIs greater than 0.01, v isiSubtracting the reconstructed value and the input value to obtain a difference value; the maximum emotional feature number of each comment is set to be 30, the difference values are arranged from small to large, and the first 30 areWord viThe difference value of (a) is smaller than that of the following word, indicating that viNo change occurs after reconstruction, and then v is found outiCorresponding Chinese words are taken as characteristic words; the difference between the later word and the previous 30 words is larger, which indicates viAfter reconstruction, if the result is changed greatly, v is convertediCorresponding as a non-characteristic word; if the number of visible nodes with reconstruction values larger than 0.01 in a certain comment is smaller than 30, all the words corresponding to the visible nodes are reserved as feature words; the reserved feature words jointly form a vector space of each comment;
5) inputting the comment vectors newly generated after the processing of the step 4) into a support vector machine for classification to obtain the emotion category of each comment vector; the emotion classification is positive or negative.
In the step 1), the generation process of the comment vector is as follows: and (3) adopting a word bag model in natural language processing, if a certain word appears in a certain comment, setting the corresponding weight value of the word in a comment vector to be 1, otherwise, setting the value to be 0, and setting the length of the comment vector to be the size of a dictionary for extracting the word.
The number of nodes of the visible layer is 476 and the number of nodes of the hidden layer is 50.
Compared with the prior art, the invention has the beneficial effects that: according to the method, a deep learning algorithm RBM is adopted to conduct high-level abstraction on original feature information of Chinese product reviews, the original feature information is reconstructed according to the high-level information, feature selection is conducted according to a reconstruction value, and feature words capable of expressing high-level structure information better are obtained. The method can extract the characteristics which can express the emotional essence of the product comment; the new features obtained by feature extraction by the RBM are the combination of the original features, and feature selection by the RBM is to select the feature words of the comments according to comparison between the reconstructed comments and the original comments. Compared with the characteristic extraction, the selected characteristic has more intuition; feature selection is carried out on the original features of the Chinese product reviews through RBM, and then review vectors generated by feature words are input into SVM for sentiment classification. Experimental results show that the method has high precision.
Drawings
FIG. 1 is a flow chart of the RBM-SVM algorithm;
FIG. 2 is a bag-of-words text vector representation;
FIG. 3 is a word space after RBM reconstruction;
FIG. 4 shows the comparison before and after dimension reduction of a certain comment information feature selection.
Detailed Description
The specific implementation process of the invention is as follows:
step 1: 1706 comment texts in a Chinese comment corpus related to electronic products are extracted from the data hall, wherein 853 positive comments and 853 negative comments exist.
Step 2: the 1706 comment texts are subjected to Chinese word segmentation by using an accurate mode in the Chinese word segmentation, and the part of speech is judged. Stop words are removed while part-of-speech filtering rules are defined. Defining part-of-speech filtering rules is:
a. establishing a part-of-speech set expressing the tendency of emotion: { noun, adjective, adverb, and conjunctive },
b. and judging the part of speech of each word in the comment, deleting the word if the part of speech of a certain word is not in the part of speech set, and otherwise, keeping the word.
And step 3: using the bag of words model in natural language processing, if the word appears in the comment, it is set to 1, otherwise it is set to 0, which is used as the word vector representation of the text. The length of the vector is the size of the dictionary of extracted words. Such as: there are two comments d1 and d2, and the two comments are processed to extract nouns, adjectives, adverbs, conjunctions, etc. in the two comments, which are marked as w1, w2, w3, w4 and w5 respectively. The word vectors represented by the bag of words model are as follows:
and 4, step 4: a two-layer Restricted Boltzmann Machine (RBM) is established, the number of nodes of the visible layer is 476, and the number of nodes of the hidden layer is 50. The individual visible cells within the visible layer are independent of each other. The node i of the visible layer is connected with the node j of the hidden layer, and the connection weight value is Wji. The confined boltzmann machine has a joint configuration of energy between the visible layer and the hidden layer, which can be expressed as:
where i is the number of nodes of the visible layer unit, j is the number of nodes of the hidden layer unit, and θ ═ wij,ai,bjIs the model parameter, wijIs the connection weight value, a, between the node i of the visible layer and the node j of the hidden layeriAnd biThe offsets of the visible layer and the hidden layer, respectively. Their joint probability distribution is:
whereinThe joint probability is to find the sample probability of the hidden layer and the visible layer.
Because in RBM there is no direct connection between hidden layer units, i.e. hidden layer nodes are conditionally independent, i.e.The data expectation of an unbiased sample of the data distribution is readily obtained. Given a randomly selected visible layer v, hidden layer element hjThe probability of a binary state of 1 is:
wherein the function
At this time vi,hjIs an unbiased sample, and this function, like the function in the neural network, is a conditional bernoulli distribution. Since there is also no direct connection between visible layer cells, it is easy to get a sample of visible cells, and the probability of a binary state of 1 at a given visible layer cell is:
and 5: and inputting the segmented Chinese comments into the RBM, wherein each word in the dictionary corresponds to one node of the visible layer. And (5) adopting RBM to perform unsupervised learning.
Step 6: RBM rapidly and unsupervised training is performed by a method called Contrast Divergence (CD). the Contrast Divergence (CD) in RBM is subjected to k-step Gibbs sampling, i.e. learning is performed by using k-CDgibbs. In the k-CDgibbs algorithm, a training sample or a random initial state of a visual node is used for initializing the state v of the visual node0Alternately, the following sampling is performed, where P (h | v)0) Is h0Wherein P (v | h)0) Is v is1And the rest are similar.
h0~p(h|v0),v1~p(v|h0) (3-13)
h1~p(h|v1),v2~p(v|h1) (3-14)
......,vk+1~p(v|hk) (3-15)
Assuming S as the training set, (v)1,……,vm,h1,……hn) Respectively, a visible node and a hidden node of the RBM. The specific implementation process of K-CD Gibbs is as follows, and parameters in the process are described as follows: Δ wijIs WijIncrement of weight, Δ ajIs defined as offset ajIncrement of, Δ biIs an offset biIncrement of (2)
And 7: initializing parameters of a Restricted Boltzmann Machine (RBM): aiming at the step 4, the weight W in the RBM is subjected to normal distribution with the mean value of 0 and the standard deviation of 1; offset ajAnd bjThe initial values are all 0, respectively. The learning rate was 0.01
Step 8, setting the iteration times as 100, and finally obtaining the weight W between the visible node i and the hidden node j by the CD-k gibbs algorithmijOffset a, offset ajAnd biThe final value of (c). And p (v) is calculated with reference to the formula listed in step 4i=1|h)
And step 9: word space per comment according to p (v)i1| h) for reconstruction. If v is to beiIs less than or equal to 0.01, the visible node v is considered to beiFailing to represent higher level structures, discarding viThe corresponding words. If v isiIs greater than 0.01, v isiThe difference value is obtained by subtracting the sum input value from the reconstructed value. If the corresponding difference value is smaller, the word is not changed after reconstruction, then what the Chinese word corresponding to the word is found out, and then the word can be used as a characteristic representation; on the contrary, if the corresponding difference is larger, the word is greatly changed after being reconstructed, so that the word cannot be used as the characteristic representation of the comment. Therefore, according to the principle, the difference values are arranged from small to large, and the first 30 are selected as feature words. If the number of visible nodes with reconstruction values larger than 0.01 in a certain comment is smaller than 30, all the words corresponding to the visible nodes are reserved as the characteristic words. The word vector space of the comment is reduced through feature selection.
Step 10. the number of Gibbs samples K we selected experimentally. In the experiment, 1706 comment texts were randomly selected according to the training set and the test set in a ratio of 3: 1. The node of the hidden layer of the RBM is set to be 2, and is represented by 0 and 1, wherein 0 represents that the judged emotion of the product comment is negative at the moment, namely, the attitude that the user holds the product negatively, 1 represents that the judged emotion of the product comment is positive at the moment, namely, the attitude that the user holds the product positively, the RBM is used as a cluster, and the emotion types of the comment are output, wherein K is set to be 5,10,15,20 and 25 respectively. With the change of K, RBM measures different recognition results of product reviews by 3 indexes of accuracy, recall rate and F mean.
TABLE 1 different recognition results of RBM on product reviews
The experimental result shows that when K is 10, the F mean value is the highest, which shows that the comprehensive performance is the best, so K is 10.
Step 11: the comments subjected to dimension reduction are input into a Support Vector Machine (SVM) for emotion classification. The learning strategy of the linear classifier with the maximum interval on the SVM feature space is to maximize the interval and finally convert the interval into the solution of a convex quadratic programming problem. The objective function of the SVM is as follows:
solving the objective function to obtain:
according to <math><mrow>
<msup>
<mi>α</mi>
<mo>*</mo>
</msup>
<mo>=</mo>
<mrow>
<mo>(</mo>
<msubsup>
<mi>α</mi>
<mn>1</mn>
<mo>*</mo>
</msubsup>
<mo>,</mo>
<mo>...</mo>
<mo>...</mo>
<msubsup>
<mi>α</mi>
<mn>1</mn>
<mo>*</mo>
</msubsup>
<mo>)</mo>
</mrow>
<mo>,</mo>
</mrow></math> Computing <math><mrow>
<msup>
<mi>w</mi>
<mo>*</mo>
</msup>
<mo>=</mo>
<msubsup>
<mi>Σ</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>l</mi>
</msubsup>
<msub>
<mi>y</mi>
<mi>i</mi>
</msub>
<msubsup>
<mi>α</mi>
<mi>i</mi>
<mo>*</mo>
</msubsup>
<msub>
<mi>x</mi>
<mi>i</mi>
</msub>
<mo>,</mo>
</mrow></math> Selection of alpha*A component ofSatisfy the requirement of <math><mrow>
<mn>0</mn>
<mo><</mo>
<msubsup>
<mi>α</mi>
<mi>j</mi>
<mo>*</mo>
</msubsup>
<mo><</mo>
<mi>C</mi>
<mo>,</mo>
</mrow></math>
And calculate out according to the above <math><mrow>
<msup>
<mi>b</mi>
<mo>*</mo>
</msup>
<mo>=</mo>
<msub>
<mi>y</mi>
<mi>j</mi>
</msub>
<mo>-</mo>
<msubsup>
<mi>Σ</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>l</mi>
</msubsup>
<msub>
<mi>y</mi>
<mi>i</mi>
</msub>
<msubsup>
<mi>α</mi>
<mi>i</mi>
<mo>*</mo>
</msubsup>
<mi>K</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>x</mi>
<mi>i</mi>
</msub>
<mo>,</mo>
<msub>
<mi>x</mi>
<mi>j</mi>
</msub>
<mo>)</mo>
</mrow>
</mrow></math>
Wherein α ═ { α ═ α1……αN};K(xi,xj) For Chinese comments xiAnd xjKernel function of yiFor Chinese comments xiC is a penalty factor, and we choose the value of C to be 0.0001 in the experiment.
In SVM, Chinese comment x1And x2The kernel function we use is the radial basis function RBF, see the following equation. And mapping the feature space from a low-dimensional space to a high-dimensional space through a kernel function.
Step 12: comparing the emotion analysis method combining the RBM and the SVM with a method of singly using the SVM or the RBM, and calculating 3 indexes of accuracy, recall rate and F mean value according to the output result and the result of manual marking. The results of the experiment are shown in the following table:
TABLE 2 identification results of different methods
As seen from Table 2, the combination of RBM and SVM greatly improves the emotion analysis accuracy. Compared with the single use of the SVM, the combination of the RBM and the SVM has the advantages of improving the accuracy and the recall rate. Compared with the RBM used alone, the combination accuracy of the RBM and the SVM is greatly improved, although the recall rate reduces the point, the F mean value is much higher than that of the RBM used alone, and the comprehensive performance is good in the emotion analysis method of the combination of the RBM and the SVM.
Claims (3)
1. A Chinese text sentiment analysis method based on a computer information processing technology is characterized by comprising the following steps:
1) inputting a Chinese product comment text set, performing Chinese word segmentation by using an accurate mode in the Chinese product result segmentation, determining part of speech, filtering according to the part of speech, keeping the part of speech as nouns, adjectives, adverbs and conjunctions, obtaining a corpus set after word segmentation by Chinese word segmentation, calculating the text segmented into words by Chinese word segmentation to obtain the weight of each word at each comment, and generating a comment vector;
2) feature selection is carried out on the comment vectors by adopting a restricted Boltzmann machine; the restricted Boltzmann machine is composed of a visible layer V and a hidden layer H, node neurons in the visible layer are binary random numbers, nodes of the visible layer are mutually independent, and nodes of the hidden layer are mutually independent; the connection weight between the visible layer node and the hidden layer node is W; inputting the comment vectors generated in the step 1) into a visible unit layer;
3) the restricted Boltzmann machine is trained by a contrast divergence method, relevant parameters of the restricted Boltzmann machine are initialized, and after 100 iterations, a hidden layer node h is obtainedi1 conditional probability p (h) in the visible layer vj1/v) and a visual layer node viConditional probability p (v) under hidden layer h of 1i1/h); wherein, <math>
<mrow>
<mi>p</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>h</mi>
<mi>j</mi>
</msub>
<mo>=</mo>
<mn>1</mn>
<mo>|</mo>
<mi>v</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mi>σ</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>b</mi>
<mi>j</mi>
</msub>
<mo>+</mo>
<munder>
<mo>Σ</mo>
<mi>i</mi>
</munder>
<msub>
<mi>v</mi>
<mi>r</mi>
</msub>
<msub>
<mi>w</mi>
<mrow>
<mi>i</mi>
<mi>j</mi>
</mrow>
</msub>
<mo>)</mo>
</mrow>
<mo>;</mo>
</mrow>
</math> <math>
<mrow>
<mi>p</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>v</mi>
<mi>i</mi>
</msub>
<mo>=</mo>
<mn>1</mn>
<mo>|</mo>
<mi>h</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mi>σ</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>a</mi>
<mi>i</mi>
</msub>
<mo>+</mo>
<munder>
<mo>Σ</mo>
<mi>i</mi>
</munder>
<msub>
<mi>h</mi>
<mi>j</mi>
</msub>
<msub>
<mi>w</mi>
<mrow>
<mi>i</mi>
<mi>j</mi>
</mrow>
</msub>
<mo>)</mo>
</mrow>
<mo>;</mo>
</mrow>
</math> function(s)Offset aiAnd bjThe initial values are all 0; w is aijIs the connection weight value, w, between the node i of the visible layer and the node j of the hidden layerijObeying a normal distribution with a mean value of 0 and a standard deviation of 1;
4) according to the visible layer node viConditional probability p (v) under hidden layer h of 1i1/h) to a visible layer node viCarrying out reconstruction; if v isiIs less than or equal to 0.01, the visible node v is considerediFailing to represent higher level structures, discarding viCorresponding words; if v isiIs greater than 0.01, v isiSubtracting the reconstructed value and the input value to obtain a difference value; setting the maximum emotional feature number of each comment as 30, and arranging the difference values from small to large, wherein the first 30 words viThe difference value of (a) is smaller than that of the following word, indicating that viNo change occurs after reconstruction, and then v is found outiCorresponding Chinese words are taken as characteristic words; the difference between the later word and the previous 30 words is larger, which indicates viAfter reconstruction, if the result is changed greatly, v is convertediCorresponding as a non-characteristic word; if the number of visible nodes with reconstruction values larger than 0.01 in a certain comment is smaller than 30, all words corresponding to the visible nodes are reserved as feature words; the reserved feature words jointly form a vector space of each comment;
5) inputting the comment vectors newly generated after the processing of the step 4) into a support vector machine for classification to obtain the emotion category of each comment vector; the emotion classification is positive or negative.
2. The method for analyzing Chinese text sentiment based on computer information processing technology according to claim 1, wherein in the step 1), the comment vector is generated by: and (3) adopting a word bag model in natural language processing, if a certain word appears in a certain comment, setting the corresponding weight value of the word in a comment vector to be 1, otherwise, setting the value to be 0, and setting the length of the comment vector to be the size of a dictionary for extracting the word.
3. The method of emotion analysis for chinese text based on computer information processing technology as claimed in claim 2, wherein the number of nodes in the visible layer is 476, and the number of nodes in the hidden layer is 50.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510452024.5A CN104965822B (en) | 2015-07-29 | 2015-07-29 | A kind of Chinese text sentiment analysis method based on Computerized Information Processing Tech |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510452024.5A CN104965822B (en) | 2015-07-29 | 2015-07-29 | A kind of Chinese text sentiment analysis method based on Computerized Information Processing Tech |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104965822A true CN104965822A (en) | 2015-10-07 |
CN104965822B CN104965822B (en) | 2017-08-25 |
Family
ID=54219858
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510452024.5A Expired - Fee Related CN104965822B (en) | 2015-07-29 | 2015-07-29 | A kind of Chinese text sentiment analysis method based on Computerized Information Processing Tech |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104965822B (en) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105589941A (en) * | 2015-12-15 | 2016-05-18 | 北京百分点信息科技有限公司 | Emotional information detection method and apparatus for web text |
CN105930368A (en) * | 2016-04-13 | 2016-09-07 | 深圳大学 | Emotion classification method and system |
CN106126502A (en) * | 2016-07-07 | 2016-11-16 | 四川长虹电器股份有限公司 | A kind of emotional semantic classification system and method based on support vector machine |
CN106446117A (en) * | 2016-09-18 | 2017-02-22 | 西安电子科技大学 | Text analysis method based on poisson-gamma belief network |
CN106502394A (en) * | 2016-10-18 | 2017-03-15 | 哈尔滨工业大学深圳研究生院 | Term vector computational methods and device based on EEG signals |
CN106547740A (en) * | 2016-11-24 | 2017-03-29 | 四川无声信息技术有限公司 | Text message processing method and device |
CN106844632A (en) * | 2017-01-20 | 2017-06-13 | 清华大学 | Based on the product review sensibility classification method and device that improve SVMs |
CN106919673A (en) * | 2017-02-21 | 2017-07-04 | 浙江工商大学 | Text mood analysis system based on deep learning |
CN106933847A (en) * | 2015-12-30 | 2017-07-07 | 北京神州泰岳软件股份有限公司 | A kind of method and apparatus for setting up data classification model |
CN107193801A (en) * | 2017-05-21 | 2017-09-22 | 北京工业大学 | A kind of short text characteristic optimization and sentiment analysis method based on depth belief network |
CN107608956A (en) * | 2017-09-05 | 2018-01-19 | 广东石油化工学院 | A kind of reader's mood forecast of distribution algorithm based on CNN GRNN |
WO2018028077A1 (en) * | 2016-08-11 | 2018-02-15 | 中兴通讯股份有限公司 | Deep learning based method and device for chinese semantics analysis |
CN107918487A (en) * | 2017-10-20 | 2018-04-17 | 南京邮电大学 | A kind of method that Chinese emotion word is identified based on skin electrical signal |
WO2018082569A1 (en) * | 2016-11-04 | 2018-05-11 | 华为技术有限公司 | Sequence conversion method and device |
WO2018188567A1 (en) * | 2017-04-13 | 2018-10-18 | 腾讯科技(深圳)有限公司 | Server information pushing method and device, terminal information sending method and device, information pushing system and storage medium |
CN109464147A (en) * | 2019-01-09 | 2019-03-15 | 浙江强脑科技有限公司 | Breath signal detection method, device and computer readable storage medium |
CN109783806A (en) * | 2018-12-21 | 2019-05-21 | 众安信息技术服务有限公司 | A kind of text matching technique using semantic analytic structure |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2009017464A1 (en) * | 2007-07-31 | 2009-02-05 | Agency For Science, Technology And Research | Relation extraction system |
CN104331706A (en) * | 2014-10-29 | 2015-02-04 | 西安电子科技大学 | Polarization SAR image classification based on RBM and SVM |
-
2015
- 2015-07-29 CN CN201510452024.5A patent/CN104965822B/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2009017464A1 (en) * | 2007-07-31 | 2009-02-05 | Agency For Science, Technology And Research | Relation extraction system |
CN104331706A (en) * | 2014-10-29 | 2015-02-04 | 西安电子科技大学 | Polarization SAR image classification based on RBM and SVM |
Non-Patent Citations (3)
Title |
---|
RONAN COLLOBERT ET AL.: "A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning", 《PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING》 * |
任远远 等: "中文网络评论的产品特征提取及情感倾向判定", 《计算机系统应用》 * |
张春霞 等: "受限波尔兹曼机", 《工程数学学报》 * |
Cited By (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105589941A (en) * | 2015-12-15 | 2016-05-18 | 北京百分点信息科技有限公司 | Emotional information detection method and apparatus for web text |
CN106933847B (en) * | 2015-12-30 | 2019-12-27 | 北京神州泰岳软件股份有限公司 | Method and device for establishing data classification model |
CN106933847A (en) * | 2015-12-30 | 2017-07-07 | 北京神州泰岳软件股份有限公司 | A kind of method and apparatus for setting up data classification model |
CN105930368A (en) * | 2016-04-13 | 2016-09-07 | 深圳大学 | Emotion classification method and system |
CN105930368B (en) * | 2016-04-13 | 2019-07-30 | 深圳大学 | A kind of sensibility classification method and system |
CN106126502A (en) * | 2016-07-07 | 2016-11-16 | 四川长虹电器股份有限公司 | A kind of emotional semantic classification system and method based on support vector machine |
CN106126502B (en) * | 2016-07-07 | 2018-10-30 | 四川长虹电器股份有限公司 | A kind of emotional semantic classification system and method based on support vector machines |
WO2018028077A1 (en) * | 2016-08-11 | 2018-02-15 | 中兴通讯股份有限公司 | Deep learning based method and device for chinese semantics analysis |
CN106446117A (en) * | 2016-09-18 | 2017-02-22 | 西安电子科技大学 | Text analysis method based on poisson-gamma belief network |
CN106502394A (en) * | 2016-10-18 | 2017-03-15 | 哈尔滨工业大学深圳研究生院 | Term vector computational methods and device based on EEG signals |
CN106502394B (en) * | 2016-10-18 | 2019-06-25 | 哈尔滨工业大学深圳研究生院 | Term vector calculation method and device based on EEG signals |
CN108021549B (en) * | 2016-11-04 | 2019-08-13 | 华为技术有限公司 | Sequence conversion method and device |
WO2018082569A1 (en) * | 2016-11-04 | 2018-05-11 | 华为技术有限公司 | Sequence conversion method and device |
CN108021549A (en) * | 2016-11-04 | 2018-05-11 | 华为技术有限公司 | Sequence conversion method and device |
US11132516B2 (en) | 2016-11-04 | 2021-09-28 | Huawei Technologies Co., Ltd. | Sequence translation probability adjustment |
CN106547740A (en) * | 2016-11-24 | 2017-03-29 | 四川无声信息技术有限公司 | Text message processing method and device |
CN106844632A (en) * | 2017-01-20 | 2017-06-13 | 清华大学 | Based on the product review sensibility classification method and device that improve SVMs |
CN106844632B (en) * | 2017-01-20 | 2020-02-21 | 清华大学 | Product comment emotion classification method and device based on improved support vector machine |
CN106919673A (en) * | 2017-02-21 | 2017-07-04 | 浙江工商大学 | Text mood analysis system based on deep learning |
CN106919673B (en) * | 2017-02-21 | 2019-08-20 | 浙江工商大学 | Text mood analysis system based on deep learning |
US10924572B2 (en) | 2017-04-13 | 2021-02-16 | Tencent Technology (Shenzhen) Company Limited | Information push method and apparatus, information sending method and apparatus, system, and storage medium |
WO2018188567A1 (en) * | 2017-04-13 | 2018-10-18 | 腾讯科技(深圳)有限公司 | Server information pushing method and device, terminal information sending method and device, information pushing system and storage medium |
CN107193801A (en) * | 2017-05-21 | 2017-09-22 | 北京工业大学 | A kind of short text characteristic optimization and sentiment analysis method based on depth belief network |
CN107193801B (en) * | 2017-05-21 | 2020-07-03 | 北京工业大学 | Short text feature optimization and emotion analysis method based on deep belief network |
CN107608956A (en) * | 2017-09-05 | 2018-01-19 | 广东石油化工学院 | A kind of reader's mood forecast of distribution algorithm based on CNN GRNN |
CN107608956B (en) * | 2017-09-05 | 2021-02-19 | 广东石油化工学院 | Reader emotion distribution prediction algorithm based on CNN-GRNN |
CN107918487A (en) * | 2017-10-20 | 2018-04-17 | 南京邮电大学 | A kind of method that Chinese emotion word is identified based on skin electrical signal |
CN109783806A (en) * | 2018-12-21 | 2019-05-21 | 众安信息技术服务有限公司 | A kind of text matching technique using semantic analytic structure |
CN109783806B (en) * | 2018-12-21 | 2023-05-02 | 众安信息技术服务有限公司 | Text matching method utilizing semantic parsing structure |
CN109464147A (en) * | 2019-01-09 | 2019-03-15 | 浙江强脑科技有限公司 | Breath signal detection method, device and computer readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN104965822B (en) | 2017-08-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104965822B (en) | A kind of Chinese text sentiment analysis method based on Computerized Information Processing Tech | |
Socher et al. | Semi-supervised recursive autoencoders for predicting sentiment distributions | |
CN104217226B (en) | Conversation activity recognition methods based on deep neural network Yu condition random field | |
CN110321563B (en) | Text emotion analysis method based on hybrid supervision model | |
Chang et al. | Research on detection methods based on Doc2vec abnormal comments | |
US20130179169A1 (en) | Chinese text readability assessing system and method | |
CN110347836B (en) | Method for classifying sentiments of Chinese-Yue-bilingual news by blending into viewpoint sentence characteristics | |
CN112001187A (en) | Emotion classification system based on Chinese syntax and graph convolution neural network | |
CN112364638B (en) | Personality identification method based on social text | |
CN112256866B (en) | Text fine-grained emotion analysis algorithm based on deep learning | |
CN109101490B (en) | Factual implicit emotion recognition method and system based on fusion feature representation | |
CN110717330A (en) | Word-sentence level short text classification method based on deep learning | |
Azim et al. | Text to emotion extraction using supervised machine learning techniques | |
Sboev et al. | Deep learning network models to categorize texts according to author's gender and to identify text sentiment | |
Alroobaea | Sentiment analysis on amazon product reviews using the recurrent neural network (rnn) | |
Nassiri et al. | Arabic L2 readability assessment: Dimensionality reduction study | |
CN111813939A (en) | Text classification method based on representation enhancement and fusion | |
CN115146031B (en) | Short text position detection method based on deep learning and auxiliary features | |
CN107491490B (en) | Text emotion classification method based on emotion center | |
CN113190681B (en) | Fine granularity text classification method based on capsule network mask memory attention | |
CN115906824A (en) | Text fine-grained emotion analysis method, system, medium and computing equipment | |
Rustamov et al. | An application of hidden Markov models in subjectivity analysis | |
Imad et al. | Automated Arabic News Classification using the Convolutional Neural Network. | |
CN115269833A (en) | Event information extraction method and system based on deep semantics and multitask learning | |
Ghelani | A review on student automatic grading system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20170825 Termination date: 20210729 |