CN104794208A - Sentiment classification method and system based on contextual information of microblog text - Google Patents

Sentiment classification method and system based on contextual information of microblog text Download PDF

Info

Publication number
CN104794208A
CN104794208A CN201510201443.1A CN201510201443A CN104794208A CN 104794208 A CN104794208 A CN 104794208A CN 201510201443 A CN201510201443 A CN 201510201443A CN 104794208 A CN104794208 A CN 104794208A
Authority
CN
China
Prior art keywords
mood
feature
word
microblogging text
space vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510201443.1A
Other languages
Chinese (zh)
Inventor
徐华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Wuxi Research Institute of Applied Technologies of Tsinghua University
Original Assignee
Tsinghua University
Wuxi Research Institute of Applied Technologies of Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University, Wuxi Research Institute of Applied Technologies of Tsinghua University filed Critical Tsinghua University
Priority to CN201510201443.1A priority Critical patent/CN104794208A/en
Publication of CN104794208A publication Critical patent/CN104794208A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a sentiment classification method and system based on contextual information of microblog text. The method comprises the steps that features related to sentiment are extracted from the microblog text, and a first feature space vector is established according to the mapping relation between the extracted features and dimensions, wherein the features comprise a relationship feature between sentiment words and context, a syntactic feature and a syntactic structure feature; dimensionality reduction is conducted on the first feature space vector to obtain a second feature space vector, wherein the dimension of the second feature space vector is lower than that of the first feature space vector; data corresponding to the first feature space vector are trained through a classifier, so that sentiment classification is conducted on the microblog text, and a sentiment classification result is output. The method and system have the advantage that the sentiment classification accuracy is high.

Description

Based on mood sorting technique and the system of microblogging text context information
Technical field
The present invention relates to Computer Applied Technology and Internet technical field, particularly a kind of mood sorting technique based on microblogging text context information and system.
Background technology
Along with the emerging network being representative with Sina, Tengxun constantly rises, increasing people utilize microblog to express mood about various topic and viewpoint, and then summarize a large amount of public opinion information, comprise accident, accident, disease, and other focus incident, these contain a large amount of moods and emotion information.And huge userbase has consolidated its status at Network Based Opinion Formation center further, can say that microblogging has become important emotion and expressed and intercommunion platform.At present, mood classification is an important research method in natural language field, has attracted to study it from researchist both domestic and external.
In the correlative study of mood classification, be mainly divided into the mood sorting technique of the mood sorting technique based on dictionary, the sorting technique based on machine learning and mixing, but the classification accuracy of these sorting techniques is lower.
Summary of the invention
Object of the present invention is intended at least solve one of above-mentioned technological deficiency.
For this reason, the object of the invention is to propose a kind of mood sorting technique based on microblogging text context information.The method has the high advantage of mood classification accuracy.
Another object of the present invention is to propose a kind of mood categorizing system based on microblogging text context information.
To achieve these goals, the embodiment of a first aspect of the present invention discloses a kind of mood sorting technique based on microblogging text context information, comprise the following steps: from microblogging text, extract the feature relevant to mood, and set up fisrt feature space vector according to the mapping relations of the characteristic sum dimension extracted, wherein, described feature comprises mood word and contextual relationship characteristic, part of speech characteristic sum syntactic structure feature; Carry out dimensionality reduction to obtain second feature space vector to described fisrt feature space vector, wherein, the dimension of described second feature space vector is lower than described fisrt feature space vector; The data utilizing sorter corresponding to described fisrt feature space vector are trained, and to carry out mood classification to described microblogging text, and export mood classification results.
According to the mood sorting technique based on microblogging text context information of the embodiment of the present invention, for the mood classification problem based on microblogging text on internet, utilize as part of speech feature selecting, syntactic structure feature selecting, mood word and contextual relationship characteristic select, the method such as Feature Dimension Reduction, mood sorting algorithm extracts feature from microblogging text, and the mood classification of microblogging text is realized according to the feature extracted, there is the advantage that classification accuracy is high.
In addition, the mood sorting technique based on microblogging text context information according to the above embodiment of the present invention can also have following additional technical characteristic:
In some instances, the described feature that extraction is relevant to mood from microblogging text, comprise further: from described microblogging text, extract part of speech feature according to part of speech feature selection approach, specifically comprise: utilize segmenter to carry out participle to described microblogging text, and will there is the part of speech combination of Matching Relation as described part of speech feature according to word segmentation result, wherein, the mood contained in described part of speech combinational expression microblogging text; From described microblogging text, syntactic structure feature is extracted according to syntactic structure feature selection approach, specifically comprise: syntactic analysis is carried out to described microblogging text, to build the multiple dependences between linguistic unit composition, and select described syntactic structure feature according to described dependence; From described microblogging text, extract part of speech feature according to mood word and contextual relationship characteristic system of selection, wherein, there is following relation in described mood word and contextual relationship characteristic:
(1) < main body/entity, center mood word, dependence is to type >;
(2) < conjunction, center mood word, dependence is to type >;
(3) < qualifier, center mood word, dependence is to type >;
(4) < negative word, center mood word, dependence is to type >.
In some instances, described dimensionality reduction is carried out to obtain second feature space vector to described fisrt feature space vector, comprise further: according to chi square test method, from the Feature Words described fisrt feature space vector, select high-frequency characteristic word based on following formula, described formula is:
&chi; 2 = &Sigma; i = 1 k ( N i - n p i ) 2 n p i ;
Wherein, N irepresent the observed frequency of classification i, n is sum frequency, p iit is the expected frequency of classification i;
According to PMI method, from the Feature Words described fisrt feature space vector, select characteristics of low-frequency word based on following formula, described formula is:
PMI ( w , c ) = log p ( w , c ) p ( w ) p ( c ) ;
Wherein, p (w, c) represents that document package contains word w and belongs to the probability of classification c, and p (w) represents that document package contains the probability of word w, and p (c) represents that document belongs to the probability of classification c.
In some instances, described sorter is SVM perftwo sorters.
In some instances, adopt accuracy rate, recall rate and F value as the Performance Evaluation index of described sorter.
The embodiment of second aspect present invention discloses a kind of mood categorizing system based on microblogging text context information, comprise: characteristic extracting module, for extracting the feature relevant to mood from microblogging text, and set up fisrt feature space vector according to the mapping relations of the characteristic sum dimension extracted, wherein, described feature comprises mood word and contextual relationship characteristic, part of speech characteristic sum syntactic structure feature; Feature Dimension Reduction module, for carrying out dimensionality reduction to obtain second feature space vector to described fisrt feature space vector, wherein, the dimension of described second feature space vector is lower than described fisrt feature space vector; Mood sort module, trains for the data utilizing sorter corresponding to described fisrt feature space vector, to carry out mood classification to described microblogging text, and exports mood classification results.
According to the mood categorizing system based on microblogging text context information of the embodiment of the present invention, for the mood classification problem based on microblogging text on internet, utilize as part of speech feature selecting, syntactic structure feature selecting, mood word and contextual relationship characteristic select, the method such as Feature Dimension Reduction, mood sorting algorithm extracts feature from microblogging text, and the mood classification of microblogging text is realized according to the feature extracted, there is the advantage that classification accuracy is high.
In addition, the mood categorizing system based on microblogging text context information according to the above embodiment of the present invention can also have following additional technical characteristic:
In some instances, described characteristic extracting module comprises: part of speech feature selection module, for extracting part of speech feature according to part of speech feature selection approach from described microblogging text, specifically comprise: utilize segmenter to carry out participle to described microblogging text, and will there is the part of speech combination of Matching Relation as described part of speech feature according to word segmentation result, wherein, the mood contained in described part of speech combinational expression microblogging text; Syntactic structure feature selection module, for extracting syntactic structure feature according to syntactic structure feature selection approach from described microblogging text, specifically comprise: syntactic analysis is carried out to described microblogging text, to build the multiple dependences between linguistic unit composition, and select described syntactic structure feature according to described dependence; Mood word and contextual relationship characteristic select module, and for extracting part of speech feature according to mood word and contextual relationship characteristic system of selection from described microblogging text, wherein, described mood word and contextual relationship characteristic exist following relation:
(1) < main body/entity, center mood word, dependence is to type >;
(2) < conjunction, center mood word, dependence is to type >;
(3) < qualifier, center mood word, dependence is to type >;
(4) < negative word, center mood word, dependence is to type >.
In some instances, described Feature Dimension Reduction module is used for: according to chi square test method, and from the Feature Words described fisrt feature space vector, select high-frequency characteristic word based on following formula, described formula is:
&chi; 2 = &Sigma; i = 1 k ( N i - n p i ) 2 n p i ;
Wherein, N irepresent the observed frequency of classification i, n is sum frequency, p iit is the expected frequency of classification i;
According to PMI method, from the Feature Words described fisrt feature space vector, select characteristics of low-frequency word based on following formula, described formula is:
PMI ( w , c ) = log p ( w , c ) p ( w ) p ( c ) ;
Wherein, p (w, c) represents that document package contains word w and belongs to the probability of classification c, and p (w) represents that document package contains the probability of word w, and p (c) represents that document belongs to the probability of classification c.
In some instances, described sorter is SVM perftwo sorters.
In some instances, described mood sort module adopts accuracy rate, recall rate and F value as the Performance Evaluation index of described sorter.
The aspect that the present invention adds and advantage will part provide in the following description, and part will become obvious from the following description, or be recognized by practice of the present invention.
Accompanying drawing explanation
The present invention above-mentioned and/or additional aspect and advantage will become obvious and easy understand from the following description of the accompanying drawings of embodiments, wherein,
Fig. 1 is according to an embodiment of the invention based on the overview flow chart of the mood sorting technique of microblogging text context information;
Fig. 2 is according to an embodiment of the invention based on the implementation step figure of the mood sorting technique of microblogging text context information;
Fig. 3 is according to an embodiment of the invention based on the structured flowchart of the mood categorizing system of microblogging text context information; And
Fig. 4 is the general frame figure of the mood categorizing system based on microblogging text context information according to this embodiment.
Embodiment
Be described below in detail embodiments of the invention, the example of embodiment is shown in the drawings, and wherein same or similar label represents same or similar element or has element that is identical or similar functions from start to finish.Being exemplary below by the embodiment be described with reference to the drawings, only for explaining the present invention, and can not limitation of the present invention being interpreted as.
In describing the invention, it will be appreciated that, term " " center ", " longitudinal direction ", " transverse direction ", " on ", D score, " front ", " afterwards ", " left side ", " right side ", " vertically ", " level ", " top ", " end ", " interior ", orientation or the position relationship of the instruction such as " outward " are based on orientation shown in the drawings or position relationship, only the present invention for convenience of description and simplified characterization, instead of indicate or imply that the device of indication or element must have specific orientation, with specific azimuth configuration and operation, therefore limitation of the present invention can not be interpreted as.In addition, term " first ", " second " only for describing object, and can not be interpreted as instruction or hint relative importance.
In describing the invention, it should be noted that, unless otherwise clearly defined and limited, term " installation ", " being connected ", " connection " should be interpreted broadly, and such as, can be fixedly connected with, also can be removably connect, or connect integratedly; Can be mechanical connection, also can be electrical connection; Can be directly be connected, also indirectly can be connected by intermediary, can be the connection of two element internals.For the ordinary skill in the art, concrete condition above-mentioned term concrete meaning in the present invention can be understood.
Below in conjunction with accompanying drawing description according to the mood sorting technique based on microblogging text context information of the embodiment of the present invention and system.
Fig. 1 is according to an embodiment of the invention based on the process flow diagram of the mood sorting technique of microblogging text context information.Fig. 2 is according to an embodiment of the invention based on the implementation step figure of the mood sorting technique of microblogging text context information.
As depicted in figs. 1 and 2, according to an embodiment of the invention based on the mood sorting technique of microblogging text context information, comprise the steps:
S101: extract the feature relevant to mood from microblogging text, and set up fisrt feature space vector according to the mapping relations of the characteristic sum dimension extracted, wherein, feature comprises mood word and contextual relationship characteristic, part of speech characteristic sum syntactic structure feature.
Particularly, from described microblogging text, part of speech feature is extracted according to part of speech feature selection approach, such as: utilize segmenter to carry out participle to microblogging text, and will there is the part of speech combination of Matching Relation as described part of speech feature according to word segmentation result, wherein, the mood contained in part of speech combinational expression microblogging text.
As a concrete example, part of speech feature is to one of microblogging text key character carrying out feature extraction.Such as in text mood is excavated, different part of speech features serves certain effect for the generation of personal mood and reception and registration, and wherein noun (nouns), adjective (adjective), verb (verbs) and adverbial word (adverb) are important mood indicators.According to the part of speech pattern between two words, embodiments of the invention use segmenter to carry out participle to the microblogging text that 1000 of random selecting have certain mood, obtain 49747 word segmentation result.Can find out that adverbial word and verb, adverbial word and adjectival combination can give expression to the mood contained in text from extracting the part of speech with Matching Relation combines.Such as: " today, I got a big kick! ", verb " object for appreciation " and the combination of adverbial word " very happy ", have expressed a kind of mood of happiness.In addition, noun and verb also may pass on certain mood in context of co-text, and some words have expressed a kind of strong mood in some sentence, but but have more weak mood in other sentences, even do not express any mood.
From described microblogging text, syntactic structure feature is extracted such as: syntactic analysis is carried out to microblogging text according to syntactic structure feature selection approach, to build the multiple dependences between linguistic unit composition, and select described syntactic structure feature according to dependence.
As a concrete example, by carrying out sentence structure analysis to microblogging text, the dependence between linguistic unit composition can be obtained, relation between the grammatical item such as such as " subject ", " predicate ", " object ", " attribute ", " adverbial modifier " and " complement ", and dependency analysis is based upon based on syntax rule.Interdependent syntactic analysis can be applied to event extraction, query word rank, mechanical translation and semantic classification etc.Embodiments of the invention utilize computing platform LTP to construct 24 kinds of dependences, such as coordination, dynamic guest's relation, subject-predicate relation etc., and select syntax architectural feature with this.
It should be noted that, not all dependence we all need consider, for " voice structure ", this architectural feature can not reflect the mood contained in sentence well, and it is only simple grammar property.Therefore, when selecting dependence, need connect with the mood keyword in sentence.Such as " analogy relation ", it is the one in rhetorical devices, gives another kind of things certain feature of people or thing, as long as be aware of this things some feature in mood, just can judge the feature of this person or thing.
From microblogging text, extract part of speech feature according to mood word and contextual relationship characteristic system of selection, wherein, there is following relation in mood word and contextual relationship characteristic:
(1) < main body/entity, center mood word, dependence is to type >;
(2) < conjunction, center mood word, dependence is to type >;
(3) < qualifier, center mood word, dependence is to type >;
(4) < negative word, center mood word, dependence is to type >.
In other words, in mood assorting process, majority tend to mood word feature location in the polarity of mood word itself, but, in parsing sentence, the polarity of mood word is insufficient, this is that the implication expressed by it also can be different because the structure that mood word is residing in sentence is different.Therefore analyze mood word according to relation contextual in sentence and interdependent syntax, to type of emotion detection, there is certain help.The application mainly sets up based on contextual dependence pair from above-mentioned (1) to (4).
S102: carry out dimensionality reduction to obtain second feature space vector to fisrt feature space vector, wherein, the dimension of second feature space vector is lower than fisrt feature space vector.
In one embodiment of the invention, such as: according to chi square test method, select high-frequency characteristic word based on formula 1 from the Feature Words fisrt feature space vector, formula 1 is:
&chi; 2 = &Sigma; i = 1 k ( N i - n p i ) 2 n p i - - - ( 1 )
Wherein, N irepresent the observed frequency of classification i, n is sum frequency, p iit is the expected frequency of classification i;
According to PMI method, select characteristics of low-frequency word based on formula 2 from the Feature Words fisrt feature space vector, formula 2 is:
PMI ( w , c ) = log p ( w , c ) p ( w ) p ( c ) - - - ( 2 )
Wherein, p (w, c) represents that document package contains word w and belongs to the probability of classification c, and p (w) represents that document package contains the probability of word w, and p (c) represents that document belongs to the probability of classification c.
That is, because current feature space dimensional comparison is high, this can cause its computation complexity also can be very high.And in these dimensions, there is many noise contribution need to be filtered, therefore embodiments of the invention reduce computation complexity by selecting validity feature, thus reach the effect promoting sorting algorithm.The object of Feature Dimension Reduction picks out the large Feature Words of the classification degree of correlation, mainly carries out from following two aspects: the first, picks out effective high-frequency characteristic word, uses chi square test method, as above-mentioned formula 1.The second, pick out effective low-frequency word, use the method based on PMI, formula 2 described above.
S103: the data utilizing sorter corresponding to fisrt feature space vector are trained, to carry out mood classification to microblogging text, and exports mood classification results.In one embodiment of the invention, sorter is SVM perftwo sorters.In addition, adopt accuracy rate, recall rate and F value as the Performance Evaluation index of described sorter.Namely SVM is used perfsorter is trained the data through feature selection module and Feature Dimension Reduction module, and carries out recruitment evaluation on test set.Indexes of Evaluation Effect adopts accuracy rate, recall rate and F value.Mood classification results is as shown in table 1:
Table 1
Mood Accuracy rate (%) Recall rate (%) F value (%)
Glad 79.78 89.06 84.17
Sad 84.86 82.96 83.90
In surprise 77.14 84.38 80.60
Frightened 91.67 55.93 69.47
Indignation 81.82 73.97 77.70
Detest 80.56 69.76 74.77
Specifically, SVM is adopted perfas core classification algorithm, it realizes support vector machine (SVM) to the optimization of nonlinear multivariable performance.In addition, SVM is it achieved to the error rate in common two classification and the optimization problem of ordinal regression.SVM perfbased on every plane, by the core algorithm (Cutting-Plane Subspace Pursuit algorithm) of inside, it can train larger data set, can also realize the fast prediction to predictive data set.
Due to SVM perfbe two sorters, carrying out in many assorting processes, embodiments of the invention use multiple two sorters to realize many classification.First, use two sorters to be divided into by a sentence to be in a bad mood with loss of emotion, for the sentence of being in a bad mood, use another two sorter to be divided into by the sentence of being in a bad mood to have forward mood with negative sense mood.Wherein, " happiness " belongs to forward mood, and " indignation ", " detest ", " fear ", " sadness " and " in surprise " belong to negative sense mood.The mood of this one deck will carry out many classification by multiple two sorters.
According to the mood sorting technique based on microblogging text context information of the embodiment of the present invention, for the mood classification problem based on microblogging text on internet, utilize as part of speech feature selecting, syntactic structure feature selecting, mood word and contextual relationship characteristic select, the method such as Feature Dimension Reduction, mood sorting algorithm extracts feature from microblogging text, and the mood classification of microblogging text is realized according to the feature extracted, there is the advantage that classification accuracy is high.
Further, as shown in Figure 3, embodiments of the invention disclose a kind of mood categorizing system 300 based on microblogging text context information, comprising: characteristic extracting module 310, Feature Dimension Reduction module 320 and mood sort module 330.
Wherein, characteristic extracting module 310 for extracting the feature relevant to mood from microblogging text, and set up fisrt feature space vector according to the mapping relations of the characteristic sum dimension extracted, wherein, feature comprises mood word and contextual relationship characteristic, part of speech characteristic sum syntactic structure feature.Feature Dimension Reduction module 320 is for carrying out dimensionality reduction to obtain second feature space vector to fisrt feature space vector, wherein, the dimension of second feature space vector is lower than fisrt feature space vector.Mood sort module 330 is trained for the data utilizing sorter corresponding to fisrt feature space vector, to carry out mood classification to microblogging text, and exports mood classification results.
In one embodiment of the invention, characteristic extracting module 310 comprises: part of speech feature selection module, syntactic structure feature selection module and mood word and contextual relationship characteristic select module.Wherein, part of speech feature selection module is used for from microblogging text, extracting part of speech feature according to part of speech feature selection approach, specifically comprise: utilize segmenter to carry out participle to microblogging text, and will there is the part of speech combination of Matching Relation as part of speech feature according to word segmentation result, wherein, the mood contained in part of speech combinational expression microblogging text; Syntactic structure feature selection module is used for from microblogging text, extracting syntactic structure feature according to syntactic structure feature selection approach, specifically comprise: syntactic analysis is carried out to microblogging text, to build the multiple dependences between linguistic unit composition, and select described syntactic structure feature according to dependence.Mood word and contextual relationship characteristic select module to be used for from microblogging text, extracting part of speech feature according to mood word and contextual relationship characteristic system of selection, and wherein, mood word and contextual relationship characteristic exist following relation:
(1) < main body/entity, center mood word, dependence is to type >;
(2) < conjunction, center mood word, dependence is to type >;
(3) < qualifier, center mood word, dependence is to type >;
(4) < negative word, center mood word, dependence is to type >.
In one embodiment of the invention, Feature Dimension Reduction module 320 for: according to chi square test method, select high-frequency characteristic word based on following formula from the Feature Words fisrt feature space vector, described formula is:
&chi; 2 = &Sigma; i = 1 k ( N i - n p i ) 2 n p i ;
Wherein, N irepresent the observed frequency of classification i, n is sum frequency, p iit is the expected frequency of classification i;
According to PMI method, from the Feature Words described fisrt feature space vector, select characteristics of low-frequency word based on following formula, described formula is:
PMI ( w , c ) = log p ( w , c ) p ( w ) p ( c ) ;
Wherein, p (w, c) represents that document package contains word w and belongs to the probability of classification c, and p (w) represents that document package contains the probability of word w, and p (c) represents that document belongs to the probability of classification c.
In one embodiment of the invention, sorter is SVM perftwo sorters.Further, mood sort module 330 adopts accuracy rate, recall rate and F value as the Performance Evaluation index of described sorter.
As shown in Figure 4, whole system is top-down can be divided into three large primary layers, and top layer is user interface presentation module; Centre is foreground interface module; The bottom is each algorithm function module.
User interface presentation module provides the user interface of a patterned close friend mainly to the user of mood categorizing system, browses oneself and other people emotional status to facilitate user.
Foreground interface module provides the data read and write interface of whole system, facilitates other each different functional module to carry out the I/O operation of data.
Bottom functional module mainly comprises following functional module: 1) data input module: in the microblogging text data input native system will captured from internet; 2) part of speech feature selecting algorithm module: for carrying out part of speech analysis to the microblogging text of input; 3) syntactic structure feature selecting algorithm module: for extracting the syntactic structure feature in microblogging text; 4) mood word selects module from contextual relationship characteristic: for extracting different mood words and contextual relationship characteristic; 5) Feature Dimension Reduction algoritic module: reduce computation complexity mainly through selecting validity feature; 6) mood sorting algorithm module: mainly use SVM perfclassification based training is carried out to microblog data and predicts the outcome.
According to the mood categorizing system based on microblogging text context information of the embodiment of the present invention, for the mood classification problem based on microblogging text on internet, utilize as part of speech feature selecting, syntactic structure feature selecting, mood word and contextual relationship characteristic select, the method such as Feature Dimension Reduction, mood sorting algorithm extracts feature from microblogging text, and the mood classification of microblogging text is realized according to the feature extracted, there is the advantage that classification accuracy is high.
It should be noted that, specific implementation and the embodiment of the present invention of the mood categorizing system based on microblogging text context information of the embodiment of the present invention are similar based on the specific implementation of the mood sorting technique of microblogging text context information, specifically refer to the description of method part, in order to reduce redundancy, do not repeat.
Although illustrate and describe embodiments of the invention above, be understandable that, above-described embodiment is exemplary, can not be interpreted as limitation of the present invention, those of ordinary skill in the art can change above-described embodiment within the scope of the invention when not departing from principle of the present invention and aim, revising, replacing and modification.

Claims (10)

1., based on a mood sorting technique for microblogging text context information, it is characterized in that, comprise the following steps:
From microblogging text, extract the feature relevant to mood, and set up fisrt feature space vector according to the mapping relations of the characteristic sum dimension extracted, wherein, described feature comprises mood word and contextual relationship characteristic, part of speech characteristic sum syntactic structure feature;
Carry out dimensionality reduction to obtain second feature space vector to described fisrt feature space vector, wherein, the dimension of described second feature space vector is lower than described fisrt feature space vector;
The data utilizing sorter corresponding to described fisrt feature space vector are trained, and to carry out mood classification to described microblogging text, and export mood classification results.
2. the mood sorting technique based on microblogging text context information according to claim 1, is characterized in that, the described feature that extraction is relevant to mood from microblogging text, comprises further:
From described microblogging text, part of speech feature is extracted according to part of speech feature selection approach, specifically comprise: utilize segmenter to carry out participle to described microblogging text, and will there is the part of speech combination of Matching Relation as described part of speech feature according to word segmentation result, wherein, the mood contained in described part of speech combinational expression microblogging text;
From described microblogging text, syntactic structure feature is extracted according to syntactic structure feature selection approach, specifically comprise: syntactic analysis is carried out to described microblogging text, to build the multiple dependences between linguistic unit composition, and select described syntactic structure feature according to described dependence;
From described microblogging text, extract part of speech feature according to mood word and contextual relationship characteristic system of selection, wherein, there is following relation in described mood word and contextual relationship characteristic:
(1) < main body/entity, center mood word, dependence is to type >;
(2) < conjunction, center mood word, dependence is to type >;
(3) < qualifier, center mood word, dependence is to type >;
(4) < negative word, center mood word, dependence is to type >.
3. the mood sorting technique based on microblogging text context information according to claim 1, is characterized in that, describedly carries out dimensionality reduction to obtain second feature space vector to described fisrt feature space vector, comprises further:
According to chi square test method, from the Feature Words described fisrt feature space vector, select high-frequency characteristic word based on following formula, described formula is:
&chi; 2 = &Sigma; i = 1 k ( N i - n p i ) 2 n p i ;
Wherein, N irepresent the observed frequency of classification i, n is sum frequency, p iit is the expected frequency of classification i;
According to PMI method, from the Feature Words described fisrt feature space vector, select characteristics of low-frequency word based on following formula, described formula is:
PMI ( w , c ) = log p ( w , c ) p ( w ) p ( c ) ;
Wherein, p (w, c) represents that document package contains word w and belongs to the probability of classification c, and p (w) represents that document package contains the probability of word w, and p (c) represents that document belongs to the probability of classification c.
4. the mood sorting technique based on microblogging text context information according to claim 1, is characterized in that, described sorter is SVM perftwo sorters.
5. the mood sorting technique based on microblogging text context information according to any one of claim 1-4, is characterized in that, wherein, adopts accuracy rate, recall rate and F value as the Performance Evaluation index of described sorter.
6., based on a mood categorizing system for microblogging text context information, it is characterized in that, comprising:
Characteristic extracting module, for extracting the feature relevant to mood from microblogging text, and set up fisrt feature space vector according to the mapping relations of the characteristic sum dimension extracted, wherein, described feature comprises mood word and contextual relationship characteristic, part of speech characteristic sum syntactic structure feature;
Feature Dimension Reduction module, for carrying out dimensionality reduction to obtain second feature space vector to described fisrt feature space vector, wherein, the dimension of described second feature space vector is lower than described fisrt feature space vector;
Mood sort module, trains for the data utilizing sorter corresponding to described fisrt feature space vector, to carry out mood classification to described microblogging text, and exports mood classification results.
7. the mood categorizing system based on microblogging text context information according to claim 6, it is characterized in that, described characteristic extracting module comprises:
Part of speech feature selection module, for extracting part of speech feature according to part of speech feature selection approach from described microblogging text, specifically comprise: utilize segmenter to carry out participle to described microblogging text, and will there is the part of speech combination of Matching Relation as described part of speech feature according to word segmentation result, wherein, the mood contained in described part of speech combinational expression microblogging text;
Syntactic structure feature selection module, for extracting syntactic structure feature according to syntactic structure feature selection approach from described microblogging text, specifically comprise: syntactic analysis is carried out to described microblogging text, to build the multiple dependences between linguistic unit composition, and select described syntactic structure feature according to described dependence;
Mood word and contextual relationship characteristic select module, and for extracting part of speech feature according to mood word and contextual relationship characteristic system of selection from described microblogging text, wherein, described mood word and contextual relationship characteristic exist following relation:
(1) < main body/entity, center mood word, dependence is to type >;
(2) < conjunction, center mood word, dependence is to type >;
(3) < qualifier, center mood word, dependence is to type >;
(4) < negative word, center mood word, dependence is to type >.
8. the mood categorizing system based on microblogging text context information according to claim 6, is characterized in that, described Feature Dimension Reduction module is used for:
According to chi square test method, from the Feature Words described fisrt feature space vector, select high-frequency characteristic word based on following formula, described formula is:
&chi; 2 = &Sigma; i = 1 k ( N i - n p i ) 2 n p i ;
Wherein, N irepresent the observed frequency of classification i, n is sum frequency, p iit is the expected frequency of classification i;
According to PMI method, from the Feature Words described fisrt feature space vector, select characteristics of low-frequency word based on following formula, described formula is:
PMI ( w , c ) = log p ( w , c ) p ( w ) p ( c ) ;
Wherein, p (w, c) represents that document package contains word w and belongs to the probability of classification c, and p (w) represents that document package contains the probability of word w, and p (c) represents that document belongs to the probability of classification c.
9. the mood categorizing system based on microblogging text context information according to claim 6, is characterized in that, described sorter is SVM perftwo sorters.
10. the mood categorizing system based on microblogging text context information according to any one of claim 6-9, is characterized in that, described mood sort module adopts accuracy rate, recall rate and F value as the Performance Evaluation index of described sorter.
CN201510201443.1A 2015-04-24 2015-04-24 Sentiment classification method and system based on contextual information of microblog text Pending CN104794208A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510201443.1A CN104794208A (en) 2015-04-24 2015-04-24 Sentiment classification method and system based on contextual information of microblog text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510201443.1A CN104794208A (en) 2015-04-24 2015-04-24 Sentiment classification method and system based on contextual information of microblog text

Publications (1)

Publication Number Publication Date
CN104794208A true CN104794208A (en) 2015-07-22

Family

ID=53559000

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510201443.1A Pending CN104794208A (en) 2015-04-24 2015-04-24 Sentiment classification method and system based on contextual information of microblog text

Country Status (1)

Country Link
CN (1) CN104794208A (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105426381A (en) * 2015-08-27 2016-03-23 浙江大学 Music recommendation method based on emotional context of microblog
CN105843957A (en) * 2016-04-15 2016-08-10 国家计算机网络与信息安全管理中心 Depth sorting method and system for microblogs
CN106294845A (en) * 2016-08-19 2017-01-04 清华大学 The many emotions sorting technique extracted based on weight study and multiple features and device
CN106339368A (en) * 2016-08-24 2017-01-18 乐视控股(北京)有限公司 Text emotional tendency acquiring method and device
CN106484861A (en) * 2016-10-08 2017-03-08 珠海格力电器股份有限公司 The method and apparatus of pushed information
CN106528538A (en) * 2016-12-07 2017-03-22 竹间智能科技(上海)有限公司 Method and device for intelligent emotion recognition
CN106557463A (en) * 2016-10-31 2017-04-05 东软集团股份有限公司 Sentiment analysis method and device
CN107341496A (en) * 2016-05-03 2017-11-10 株式会社理光 A kind of word analysis method and device
CN107861936A (en) * 2016-09-28 2018-03-30 平安科技(深圳)有限公司 The polarity probability analysis method and device of sentence
CN108536868A (en) * 2018-04-24 2018-09-14 北京慧闻科技发展有限公司 The data processing method of short text data and application on social networks
CN108780660A (en) * 2016-02-29 2018-11-09 皇家飞利浦有限公司 The equipment, system and method classified to the cognitive Bias in microblogging relative to the evidence centered on health care
CN109977231A (en) * 2019-04-10 2019-07-05 上海海事大学 A kind of depressive emotion analysis method based on emotion decay factor
CN111581954A (en) * 2020-05-15 2020-08-25 中国人民解放军国防科技大学 Text event extraction method and device based on grammar dependency information
CN112989033A (en) * 2020-12-03 2021-06-18 昆明理工大学 Microblog emotion classification method based on emotion category description

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102279890A (en) * 2011-09-02 2011-12-14 苏州大学 Sentiment word extracting and collecting method based on micro blog
CN103761239A (en) * 2013-12-09 2014-04-30 国家计算机网络与信息安全管理中心 Method for performing emotional tendency classification to microblog by using emoticons
CN103970864A (en) * 2014-05-08 2014-08-06 清华大学 Emotion classification and emotion component analyzing method and system based on microblog texts

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102279890A (en) * 2011-09-02 2011-12-14 苏州大学 Sentiment word extracting and collecting method based on micro blog
CN103761239A (en) * 2013-12-09 2014-04-30 国家计算机网络与信息安全管理中心 Method for performing emotional tendency classification to microblog by using emoticons
CN103970864A (en) * 2014-05-08 2014-08-06 清华大学 Emotion classification and emotion component analyzing method and system based on microblog texts

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DONGWEN ZHANG ET AL: "Chinese comments sentiment classification based on word2vec and SVMperf", 《EXPERT SYSTEMS WITH APPLICATIONS》 *
KAI GAO ET AL: "emotion classification based on structured information", 《2014 INTERNATIONAL CONFERENCE ON MULTISENSOR FUSION AND INFORMATION INTEGRATION FOR INTELLIGENT SYSTEMS》 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105426381B (en) * 2015-08-27 2018-10-26 浙江大学 A kind of music recommendation method based on microblogging mood context
CN105426381A (en) * 2015-08-27 2016-03-23 浙江大学 Music recommendation method based on emotional context of microblog
CN108780660B (en) * 2016-02-29 2023-10-20 皇家飞利浦有限公司 Apparatus, system, and method for classifying cognitive bias in a microblog relative to healthcare-centric evidence
CN108780660A (en) * 2016-02-29 2018-11-09 皇家飞利浦有限公司 The equipment, system and method classified to the cognitive Bias in microblogging relative to the evidence centered on health care
CN105843957A (en) * 2016-04-15 2016-08-10 国家计算机网络与信息安全管理中心 Depth sorting method and system for microblogs
CN107341496A (en) * 2016-05-03 2017-11-10 株式会社理光 A kind of word analysis method and device
CN106294845A (en) * 2016-08-19 2017-01-04 清华大学 The many emotions sorting technique extracted based on weight study and multiple features and device
CN106294845B (en) * 2016-08-19 2019-08-09 清华大学 The susceptible thread classification method and device extracted based on weight study and multiple features
CN106339368A (en) * 2016-08-24 2017-01-18 乐视控股(北京)有限公司 Text emotional tendency acquiring method and device
CN107861936A (en) * 2016-09-28 2018-03-30 平安科技(深圳)有限公司 The polarity probability analysis method and device of sentence
CN106484861A (en) * 2016-10-08 2017-03-08 珠海格力电器股份有限公司 The method and apparatus of pushed information
CN106557463A (en) * 2016-10-31 2017-04-05 东软集团股份有限公司 Sentiment analysis method and device
CN106528538A (en) * 2016-12-07 2017-03-22 竹间智能科技(上海)有限公司 Method and device for intelligent emotion recognition
CN108536868A (en) * 2018-04-24 2018-09-14 北京慧闻科技发展有限公司 The data processing method of short text data and application on social networks
CN108536868B (en) * 2018-04-24 2022-04-15 北京慧闻科技(集团)有限公司 Data processing method and device for short text data on social network
CN109977231A (en) * 2019-04-10 2019-07-05 上海海事大学 A kind of depressive emotion analysis method based on emotion decay factor
CN109977231B (en) * 2019-04-10 2021-04-02 上海海事大学 Depressed mood analysis method based on emotional decay factor
CN111581954A (en) * 2020-05-15 2020-08-25 中国人民解放军国防科技大学 Text event extraction method and device based on grammar dependency information
CN112989033A (en) * 2020-12-03 2021-06-18 昆明理工大学 Microblog emotion classification method based on emotion category description

Similar Documents

Publication Publication Date Title
CN104794208A (en) Sentiment classification method and system based on contextual information of microblog text
Onan et al. A term weighted neural language model and stacked bidirectional LSTM based framework for sarcasm identification
Torres-Moreno Automatic text summarization
Zhou et al. Ontology‐supported polarity mining
Khan et al. SWIMS: Semi-supervised subjective feature weighting and intelligent model selection for sentiment analysis
Li et al. DWWP: Domain-specific new words detection and word propagation system for sentiment analysis in the tourism domain
US20180060306A1 (en) Extracting facts from natural language texts
Tungthamthiti et al. Recognition of sarcasms in tweets based on concept level sentiment analysis and supervised learning approaches
Kamal et al. Cat-bigru: Convolution and attention with bi-directional gated recurrent unit for self-deprecating sarcasm detection
RU2636098C1 (en) Use of depth semantic analysis of texts on natural language for creation of training samples in methods of machine training
CN103034626A (en) Emotion analyzing system and method
Malandrakis et al. Kernel models for affective lexicon creation
Puniyani et al. Social links from latent topics in microblogs
Cocarascu et al. Dataset independent baselines for relation prediction in argument mining
Toshevska et al. Comparative analysis of word embeddings for capturing word similarities
CN111339772B (en) Russian text emotion analysis method, electronic device and storage medium
Hamdi et al. A review on challenging issues in Arabic sentiment analysis
CN103744838A (en) Chinese emotional abstract system and Chinese emotional abstract method for measuring mainstream emotional information
Tungthamthiti et al. Recognition of sarcasm in microblogging based on sentiment analysis and coherence identification
CN104794209A (en) Chinese microblog sentiment classification method and system based on Markov logic network
Anjum et al. Exploring Humor in Natural Language Processing: A Comprehensive Review of JOKER Tasks at CLEF Symposium 2023.
Durga et al. Ontology based text categorization-telugu document
Prakash et al. Automatic extraction of similar poetry for study of literary texts: An experiment on Hindi poetry
Lee et al. Detecting suicidality with a contextual graph neural network
Kodiyala et al. Emotion recognition and sentiment classification using bert with data augmentation and emotion lexicon enrichment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20150722

RJ01 Rejection of invention patent application after publication