CN104809104A - Method and system for identifying micro-blog textual emotion - Google Patents

Method and system for identifying micro-blog textual emotion Download PDF

Info

Publication number
CN104809104A
CN104809104A CN201510236384.1A CN201510236384A CN104809104A CN 104809104 A CN104809104 A CN 104809104A CN 201510236384 A CN201510236384 A CN 201510236384A CN 104809104 A CN104809104 A CN 104809104A
Authority
CN
China
Prior art keywords
microblogging text
text message
microblogging
microblog users
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510236384.1A
Other languages
Chinese (zh)
Inventor
李寿山
黄磊
周国栋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou University
Original Assignee
Suzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University filed Critical Suzhou University
Priority to CN201510236384.1A priority Critical patent/CN104809104A/en
Publication of CN104809104A publication Critical patent/CN104809104A/en
Pending legal-status Critical Current

Links

Abstract

The invention provides a method and a system for identifying a micro-blog textual emotion. The method comprises the following steps of acquiring first micro-blog text information which are the original micro-blog text information delivered by a micro-blog user; labeling the emotion classificatory of the first micro-blog text information, performing word-dividing operation on the labeled first micro-blog text information to obtain the corresponding word-dividing set of the first micro-blog text information, and according to the word-dividing set, obtaining the corresponding characteristic vector of the first micro-blog text information; obtaining the maximum entropy classifier according to the emotion classificatory and the characteristic vector of the first micro-blog text information; adopting the maximum entropy classifier to identify the emotion of the tested corpus. According to the method and the system for identifying the micro-blog text emotion provided by the invention, whether a micro-blog text contains emotion information or not is judged.

Description

A kind of microblogging text Emotion identification method and system
Technical field
The present invention relates to natural language processing and social networks technical field, more particularly, relate to a kind of microblogging text Emotion identification method and system.
Background technology
Along with the rise and development of Web2.0, the scale of the network information also increases thereupon fast, and internet has become the important carrier of various information.The social media such as forum, blog, microblogging become the important channel of popular sharing information and emotion.Wherein microblogging give user freer, more efficiently mode to express viewpoint, record mood, become one of domestic internet, applications the most popular.Mood analysis contributes to improving internet public feelings supervisory system, helps enterprise to formulate advertisement putting accurately, for accident provides early warning etc.; In addition, mood analysis and research can also to help other field, as the research in the fields such as psychology, sociology, finance.Increasing research work starts to pay close attention to microblogging in this context, and a wherein important class research is microblogging text Emotion identification.But at present a kind of feasible microblogging text Emotion identification method is not proposed yet.
In sum, how providing a kind of microblogging text Emotion identification method and system, is current those skilled in the art's problem demanding prompt solution.
Summary of the invention
In view of this, the object of this invention is to provide a kind of microblogging text Emotion identification method and system, in order to whether to obtain microblogging text to be sorted containing being in a bad mood.
To achieve these goals, the invention provides following technical scheme:
On the one hand, the invention provides a kind of microblogging text Emotion identification method, comprising:
Obtain the first microblogging text message; Described first microblogging text message is the original microblogging text message that microblog users is delivered;
Mark the mood classification of described first microblogging text message, and participle operation is carried out to the described first microblogging text message after mark, to obtain point set of words of corresponding all described first microblogging text messages, and obtain the corresponding proper vector of each described first microblogging text message according to described point of set of words;
According to mood classification and the corresponding described proper vector acquisition maximum entropy classifiers of described first microblogging text message;
Described maximum entropy classifiers is adopted to carry out Emotion identification to testing material.
Preferably, also comprise before acquisition first microblogging text message:
Collect the second microblogging text message; Described second microblogging text message is all microblogging text messages that described microblog users is delivered.
Preferably, described collection second microblogging text message comprises:
Build Subscriber Queue, and initialization operation is carried out to described Subscriber Queue;
Choose any one microblog users as seed user, and the concern user of described seed user and described seed user and bean vermicelli user are added in described Subscriber Queue;
A microblog users is chosen arbitrarily from described Subscriber Queue, and capture by the application programming interface that microblogging provides the described second microblogging text message that described microblog users delivers, and add the concern user of described microblog users and bean vermicelli user to described Subscriber Queue;
Judge whether the quantity of the described microblog users captured reaches preset value;
When the described microblog users quantity captured reaches preset value, then stop collecting described second microblogging text message.
Preferably, described acquisition first microblogging text message comprises:
Obtain the type field value in the described application programming interface that described microblogging provides;
When the described the type field value obtained equals 1, then filter out and equal 1 corresponding described second microblogging text message with the type field value in described application programming interface, and described second microblogging text message is labeled as described first microblogging text message.
Preferably, the described maximum entropy classifiers of described employing is carried out Emotion identification to testing material and is comprised:
Adopt described maximum entropy classifiers to classify to described testing material, and obtain corresponding classification results;
Described classification results is added up, and obtains corresponding statistics;
Judge that whether described testing material is containing being in a bad mood according to described statistics.
On the other hand, present invention also offers a kind of microblogging text Emotion identification system, comprising:
First acquisition module, for obtaining the first microblogging text message; Described first microblogging text message is the original microblogging text message that microblog users is delivered;
Second acquisition module, for marking the mood classification of described first microblogging text message, and participle operation is carried out to the described first microblogging text message after mark, to obtain point set of words of corresponding all described first microblogging text messages, and obtain the corresponding proper vector of each described first microblogging text message according to described point of set of words;
3rd acquisition module, for obtaining maximum entropy classifiers according to the mood classification of described first microblogging text message and corresponding described proper vector;
Identification module, carries out Emotion identification for adopting described maximum entropy classifiers to testing material.
Preferably, described system also comprises:
Collection module, for collecting the second microblogging text message; Described second microblogging text message is all microblogging text messages that described microblog users is delivered.
Preferably, described collection module comprises:
Construction unit, for building Subscriber Queue, and carries out initialization operation to described Subscriber Queue;
Choosing unit, for choosing any one microblog users as seed user, and the concern user of described seed user and described seed user and bean vermicelli user being added in described Subscriber Queue;
Placement unit, for choosing arbitrarily a microblog users from described Subscriber Queue, and capture by the application programming interface that microblogging provides the described second microblogging text message that described microblog users delivers, and add the concern user of described microblog users and bean vermicelli user to described Subscriber Queue;
Judging unit, for judging whether the quantity of the described microblog users captured reaches preset value;
Collector unit, during for reaching preset value when the described microblog users quantity captured, then stops collecting described second microblogging text message.
Preferably, described first acquisition module comprises:
Acquiring unit, for obtaining the type field value in described application programming interface that described microblogging provides;
Screening unit, during for equaling 1 when the described the type field value obtained, then filter out and equal 1 corresponding described second microblogging text message with the type field value in described application programming interface, and described second microblogging text message is labeled as described first microblogging text message.
Preferably, described identification module comprises:
Taxon, for adopting described maximum entropy classifiers to classify to described testing material, and obtains corresponding classification results;
Statistic unit, for adding up described classification results, and obtains corresponding statistics;
Judging unit, for judging that according to described statistics whether described testing material is containing being in a bad mood.
Compared with prior art, advantage of the present invention is as follows:
The invention provides a kind of microblogging text Emotion identification method and system, first the first microblogging text message is obtained, the i.e. original microblogging text message delivered of microblog users, mark the mood classification of the first microblogging text message simultaneously, and participle operation is carried out to the first microblogging text message after mark, to obtain point set of words of all first microblogging text messages accordingly, and obtain the corresponding proper vector of each first microblogging text message according to point set of words; And according to the mood classification of the first microblogging text message and the maximum entropy classifiers of proper vector acquisition accordingly; So that adopt maximum entropy classifiers to carry out Emotion identification to testing material, a kind of microblogging text Emotion identification method and system provided by the invention, achieve the judgement to whether including emotional information in microblogging text.
Accompanying drawing explanation
In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is only embodiments of the invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to the accompanying drawing provided.
A kind of process flow diagram of a kind of microblogging text Emotion identification method that Fig. 1 provides for the embodiment of the present invention;
The another kind of process flow diagram of a kind of microblogging text Emotion identification method that Fig. 2 provides for the embodiment of the present invention;
A kind of sub-process figure of a kind of microblogging text Emotion identification that Fig. 3 provides for the embodiment of the present invention;
A kind of structural representation of a kind of microblogging text Emotion identification system that Fig. 4 provides for the embodiment of the present invention;
The another kind of structural representation of a kind of microblogging text Emotion identification system that Fig. 5 provides for the embodiment of the present invention;
One kernel texture schematic diagram of a kind of microblogging text Emotion identification system that Fig. 6 provides for the embodiment of the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
So-called microblogging text Emotion identification, is exactly that computer helps user's quick obtaining, arrangement and analysis relevant microblog text emotional information, and carries out the processes such as corresponding conclusion and reasoning, and then whether identify microblogging text containing being in a bad mood.Wherein.The microblogging text Emotion identification method and system that the embodiment of the present invention provides are based on Tengxun's microblogging, but it is not merely applicable to microblogging text message, can also be applied in other text message identification.
Please refer to Fig. 1, it illustrates a kind of process flow diagram of a kind of microblogging text Emotion identification method that the embodiment of the present invention provides, can comprise the following steps:
Step 101: obtain the first microblogging text message.
Wherein, the first microblogging text message is the original microblogging text message that microblog users is delivered.
Step 102: the mood classification marking the first microblogging text message, and participle operation is carried out to the first microblogging text message after mark, to obtain point set of words of all first microblogging text messages accordingly, and obtain the corresponding proper vector of each first microblogging text message according to point set of words.
It should be noted that, namely the process marked the first microblogging text is in embodiments of the present invention the process of classifying to the mood classification of the first microblogging text message.Wherein, the mood classification of the first microblogging text message can be divided into two large classes: one is the first microblogging text message of being in a bad mood, and one is the first microblogging text message of loss of emotion.Finally the first microblogging text message of the first microblogging text message of being in a bad mood marked and loss of emotion is separately deposited, as a line deposits a microblogging.
Generally, the mood classification of mode to the first microblogging text message of artificial mark can be adopted to mark, simultaneously in order to improve the accuracy of artificial mark, 2 or 3 mark personnel generally can be selected to carry out this operation, also can adopt more mark personnel simultaneously, namely the mark personnel adopted are more, and the accuracy of mark will be higher.
Simultaneously, Stanford instrument is adopted to carry out word segmentation processing to the first microblogging text message in embodiments of the present invention, all words after participle are created as a corresponding point of set of words simultaneously, it should be noted that, for the word repeated in word segmentation result, only the word that first time occurs need be added in point set of words.
After completing point set of words, obtain according to this point of set of words the first microblogging text message characteristic of correspondence vector that each row deposits successively.
Step 103: according to mood classification and the corresponding proper vector acquisition maximum entropy classifiers of the first microblogging text message.
Owing to not only comprising mood classification in pretreated first microblogging text message, also has corresponding participle word, therefore the proper vector of the first microblogging text message under corresponding mood can be obtained according to participle word, and according to mood classification and proper vector training maximum entropy classifiers.
Step 104: adopt maximum entropy classifiers to carry out Emotion identification to testing material.
Meanwhile, maximum entropy classifiers is adopted to carry out Emotion identification to testing material.
It should be noted that, before employing maximum entropy classifiers carries out Emotion identification to testing material, need to adopt the method described in step 101 ~ step 102 mark testing material and obtain proper vector, and then the classifying quality of checking maximum entropy classifiers.
Embodiments provide a kind of microblogging text Emotion identification method, first the first microblogging text message is obtained, the i.e. original microblogging text message delivered of microblog users, mark the mood classification of the first microblogging text message simultaneously, and participle operation is carried out to the first microblogging text message after mark, to obtain point set of words of all first microblogging text messages accordingly, and obtain the corresponding proper vector of each first microblogging text message according to point set of words; And according to the mood classification of the first microblogging text message and the maximum entropy classifiers of proper vector acquisition accordingly; So that adopt maximum entropy classifiers to carry out Emotion identification to testing material, a kind of microblogging text Emotion identification method that the embodiment of the present invention provides, achieve the judgement to whether including emotional information in microblogging text.
Please refer to Fig. 2, it illustrates the another kind of process flow diagram of a kind of microblogging text Emotion identification method that the embodiment of the present invention provides, can comprise the following steps:
Step 201: collect the second microblogging text message.
Wherein, the second microblogging text message is all microblogging text messages that microblog users is delivered.
Step 202: obtain the first microblogging text message.
Wherein, the first microblogging text message is the original microblogging text message that microblog users is delivered.
Because the second microblogging text message collected in step 201 includes all types of microblogging state that user delivers, deliver as original, reprint, personal letter, reply, backlash, to mention and comment etc., but in the microblogging text Emotion identification method that the embodiment of the present invention provides, only choose the original microblogging text message delivered of microblog users, i.e. the first microblogging text message, this is because the emotional information in the first microblogging text message more can embody the mood of microblogging publisher.Therefore need to extract the first microblogging text message from the second microblogging text message, wherein, the acquisition process of the first micro-blog information can according to API (ApplicationProgramming Interface, application programming interface) in the type field value that provides only select the original microblogging state delivering type, it specifically can realize in the following manner:
(1) the type field value in the application programming interface that provides of microblogging is provided;
(2) when the type field value obtained equals 1, then filter out and equal 1 corresponding microblogging text message with the type field value in application programming interface, and microblogging text message is labeled as the first microblogging text message.
It should be noted that, only select the original microblogging state delivering type to be a kind of preferred method according to the type field value provided in API, but be not unique method.
Step 203: the mood classification marking the first microblogging text message, and participle operation is carried out to the first microblogging text message after mark, to obtain point set of words of all first microblogging text messages accordingly, and obtain the corresponding proper vector of each first microblogging text message according to point set of words.
The method of artificial mark can be adopted in embodiments of the present invention to mark the first microblogging text message, and as the first microblogging text, " today is fine wherein! " owing to there is no any mood in the first microblogging text, therefore can be loss of emotion by the first microblogging text mark.When after the mark completing the first microblogging text message, first microblogging text of the first microblogging text of being in a bad mood marked and loss of emotion is separately deposited respectively, a line deposits a microblogging, and then forms corpus, the i.e. proper vector of the first microblogging text message.
It should be noted that, pre-service is carried out to the first microblogging text message after mark and comprises and use Stanford instrument to carry out word segmentation processing.
Be understandable that, carry out marking the method that can adopt artificial mark to the first microblogging text message in embodiments of the present invention, other also can be adopted can to reach the method for mark effect.
Step 204: according to mood classification and the corresponding proper vector acquisition maximum entropy classifiers of the first microblogging text message.
Wherein, maximum entropy model is the theoretical foundation of maximum entropy classifiers, its basic thought is all known factor Modling model, and the factor of all the unknowns is foreclosed, namely to find a probability distribution, it not only can meet all known facts, and can not by the impact of any X factor.Its specific implementation process is as described below:
Suppose that x is proper vector, y is the output valve of sample class.P (y|x) is the probability that sample is predicted to be a certain classification.Maximum entropy model requires that p (y|x) is under the condition meeting certain constraint, must make the entropy defined obtain maximal value, wherein below for x probability distribution in the sample, namely under constraint set, export the most equally distributed model:
H ( p ) = - Σ x , y p ~ ( x ) p ( y | x ) log p ( y | x )
Here use H (p) to replace H (Y|X), conditional entropy H (Y|X) is a kind of mathematical measure method of conditional probability p (y|x) homogeneity, emphasizes the dependence to probability distribution p.For any given constraint set C, need to try to achieve H (p) in all models meeting C and get the p of maximal value *:
p *=argmaxH(p)
Wherein p is the statistical model met under constraint set C condition.
Feature f ithe corresponding parameter lambda of weight irepresent, then the final probability output of maximum entropy is:
p λ ( y | x ) = 1 Z λ ( x ) exp ( Σ i λ i f i ( x , y ) )
Wherein: Z λ ( x ) = Σ y exp ( Σ λ i f i ( x , y ) ) Be called normalized factor.
Step 205: adopt maximum entropy classifiers to carry out Emotion identification to testing material.
Wherein, the process adopting maximum entropy classifiers to carry out Emotion identification to testing material is as follows:
(1) adopt maximum entropy classifiers to classify to testing material, and obtain corresponding classification results;
(2) classification results is added up, and obtain corresponding statistics;
(3) judge that whether testing material is containing being in a bad mood according to statistics.
It should be noted that, utilizing maximum entropy classifiers, mood test acquisition is carried out to testing material be testing material participle after each word at the probability of being in a bad mood and each word probability at loss of emotion, therefore only the probability that each word obtains when being in a bad mood need be added, just can obtain testing material is the probability of being in a bad mood; In like manner, the probability that testing material is loss of emotion can be obtained.And then just can judge whether testing material specifically comprises mood by the probability comparing probability and the loss of emotion of being in a bad mood.
The accuracy rate that a kind of microblogging text Emotion identification method that embodiments providing the employing embodiment of the present invention provides carries out emotion judgment to microblogging text is 0.80, wherein, corpus includes 1200, the sample of mood, 1200, the sample of loss of emotion, 200, the sample of mood is included, 200, the sample of loss of emotion in testing material.
Please refer to Fig. 3, it illustrates a kind of sub-process figure of a kind of microblogging text Emotion identification method that the embodiment of the present invention provides, namely collecting the second microblogging text message can comprise the following steps:
Step 301: build Subscriber Queue, and initialization operation is carried out to Subscriber Queue.
It should be noted that, build Subscriber Queue, and be to ensure that the Subscriber Queue built is an empty queue to its object of carrying out initialization operation, do not comprise any microblog users in the queue.
Step 302: choose any one microblog users as seed user, and the concern user of seed user and seed user and bean vermicelli user are added in Subscriber Queue.
Meanwhile, choose arbitrarily a microblog users as seed user, its objective is and the concern user relevant to seed user and bean vermicelli user and seed user itself are added in Subscriber Queue.
Step 303: choose arbitrarily a microblog users from Subscriber Queue, and by the second microblogging text message that the application programming interface crawl microblog users that microblogging provides is delivered, and add the concern user of microblog users and bean vermicelli user to Subscriber Queue.
When in Subscriber Queue after conserving species child user, then from Subscriber Queue, choose arbitrarily a microblog users, and obtain the second microblogging text message that this microblog users delivers, the concern user of this microblog users and bean vermicelli user are added in Subscriber Queue simultaneously, circulate successively, until the microblog users quantity chosen reaches preset value.
Step 304: judge whether the quantity of the microblog users captured reaches preset value.
Step 305: when the microblog users quantity captured reaches preset value, then stop collection second microblogging text message.
Be understandable that, preset value is the quantity of the microblog users chosen preset.
Corresponding with the embodiment of said method, the embodiment of the present invention additionally provides a kind of microblogging text Emotion identification system, please refer to Fig. 4, it illustrates a kind of structural representation of a kind of microblogging text Emotion identification system that the embodiment of the present invention provides, can comprise: the first acquisition module 11, second acquisition module 12, training module 13 and identification module 14, wherein:
First acquisition module 11, for obtaining the first microblogging text message.
First microblogging text message is the original microblogging text message that microblog users is delivered;
Preferably, the first acquisition module 11 can also comprise: acquiring unit 21 and screening unit 22, wherein:
Acquiring unit 21, for obtaining the type field value in application programming interface that microblogging provides;
Screening unit 22, during for equaling 1 when the type field value obtained, then filters out and equals 1 corresponding microblogging text message with the type field value in application programming interface, and microblogging text message is labeled as the first microblogging text message.
Second acquisition module 12, for marking the mood classification of the first microblogging text message, and participle operation is carried out to the first microblogging text message after mark, to obtain point set of words of all first microblogging text messages accordingly, and obtain the corresponding proper vector of each first microblogging text message according to point set of words.
Training module 13, for mood classification and the corresponding proper vector acquisition maximum entropy classifiers of foundation the first microblogging text message.。
Identification module 14, carries out Emotion identification for adopting maximum entropy classifiers to testing material.
Preferably, identification module 14 can also comprise: taxon 31, statistic unit 32 and judging unit 33, wherein:
Taxon 31, for adopting maximum entropy classifiers to classify to testing material, and obtains corresponding classification results;
Statistic unit 32, for adding up classification results, and obtains corresponding statistics;
According to statistics, judging unit 33, for judging that whether testing material is containing being in a bad mood.
Embodiments provide a kind of microblogging text Emotion identification system, first the first microblogging text message is obtained, the i.e. original microblogging text message delivered of microblog users, mark the mood classification of the first microblogging text message simultaneously, and participle operation is carried out to the first microblogging text message after mark, to obtain point set of words of all first microblogging text messages accordingly, and obtain the corresponding proper vector of each first microblogging text message according to point set of words; And according to the mood classification of the first microblogging text message and the maximum entropy classifiers of proper vector acquisition accordingly; So that adopt maximum entropy classifiers to carry out Emotion identification to testing material, a kind of microblogging text Emotion identification system that the embodiment of the present invention provides, achieve the judgement to whether including emotional information in microblogging text.
Please refer to Fig. 5, it illustrates the another kind of structural representation of a kind of microblogging text Emotion identification system that the embodiment of the present invention provides, on the basis of Fig. 4, can also comprise: collection module 10, wherein:
Collection module 10, for collecting the second microblogging text message
Second microblogging text message is all microblogging text messages that microblog users is delivered.
Preferably, please refer to Fig. 6, it illustrates a kernel texture schematic diagram of a kind of microblogging text Emotion identification system that the embodiment of the present invention provides, wherein, collection module 10 can also comprise: construction unit 41, choose unit 42, placement unit 43, judging unit 44 and collector unit 45, wherein:
Construction unit 41, for building Subscriber Queue, and carries out initialization operation to Subscriber Queue.
Choosing unit 42, for choosing any one microblog users as seed user, and the concern user of seed user and seed user and bean vermicelli user being added in Subscriber Queue;
Placement unit 43, for choosing arbitrarily a microblog users from Subscriber Queue, and by the second microblogging text message that the application programming interface crawl microblog users that microblogging provides is delivered, and add the concern user of microblog users and bean vermicelli user to Subscriber Queue;
Judging unit 44, for judging whether the quantity of the microblog users captured reaches preset value;
Collector unit 45, during for reaching preset value when the microblog users quantity captured, then stops collection second microblogging text message.
Finally, also it should be noted that, in this article, the such as relational terms of first and second grades and so on is only used for an entity or operation to separate with another entity or operational zone, and not necessarily requires or imply the relation that there is any this reality between these entities or operation or sequentially.And, term " comprises ", " comprising " or its any other variant are intended to contain comprising of nonexcludability, thus make to comprise the process of a series of key element, method, article or equipment and not only comprise those key elements, but also comprise other key elements clearly do not listed, or also comprise by the intrinsic key element of this process, method, article or equipment.When not more restrictions, the key element limited by statement " comprising ... ", and be not precluded within process, method, article or the equipment comprising described key element and also there is other identical element.
To the above-mentioned explanation of the disclosed embodiments, those skilled in the art are realized or uses the present invention.To be apparent for a person skilled in the art to the multiple amendment of these embodiments, General Principle as defined herein can without departing from the spirit or scope of the present invention, realize in other embodiments.Therefore, the present invention can not be restricted to these embodiments shown in this article, but will meet the widest scope consistent with principle disclosed herein and features of novelty.

Claims (10)

1. a microblogging text Emotion identification method, is characterized in that, comprising:
Obtain the first microblogging text message; Described first microblogging text message is the original microblogging text message that microblog users is delivered;
Mark the mood classification of described first microblogging text message, and participle operation is carried out to the described first microblogging text message after mark, to obtain point set of words of corresponding all described first microblogging text messages, and obtain the corresponding proper vector of each described first microblogging text message according to described point of set of words;
According to mood classification and the corresponding described proper vector acquisition maximum entropy classifiers of described first microblogging text message;
Described maximum entropy classifiers is adopted to carry out Emotion identification to testing material.
2. method according to claim 1, is characterized in that, also comprises before acquisition first microblogging text message:
Collect the second microblogging text message; Described second microblogging text message is all microblogging text messages that described microblog users is delivered.
3. method according to claim 2, is characterized in that, described collection second microblogging text message comprises:
Build Subscriber Queue, and initialization operation is carried out to described Subscriber Queue;
Choose any one microblog users as seed user, and the concern user of described seed user and described seed user and bean vermicelli user are added in described Subscriber Queue;
A microblog users is chosen arbitrarily from described Subscriber Queue, and capture by the application programming interface that microblogging provides the described second microblogging text message that described microblog users delivers, and add the concern user of described microblog users and bean vermicelli user to described Subscriber Queue;
Judge whether the quantity of the described microblog users captured reaches preset value;
When the described microblog users quantity captured reaches preset value, then stop collecting described second microblogging text message.
4. method according to claim 3, is characterized in that, described acquisition first microblogging text message comprises:
Obtain the type field value in the described application programming interface that described microblogging provides;
When the described the type field value obtained equals 1, then filter out and equal 1 corresponding described second microblogging text message with the type field value in described application programming interface, and described second microblogging text message is labeled as described first microblogging text message.
5. method according to claim 4, is characterized in that, the described maximum entropy classifiers of described employing is carried out Emotion identification to testing material and comprised:
Adopt described maximum entropy classifiers to classify to described testing material, and obtain corresponding classification results;
Described classification results is added up, and obtains corresponding statistics;
Judge that whether described testing material is containing being in a bad mood according to described statistics.
6. a microblogging text Emotion identification system, is characterized in that, comprising:
First acquisition module, for obtaining the first microblogging text message; Described first microblogging text message is the original microblogging text message that microblog users is delivered;
Second acquisition module, for marking the mood classification of described first microblogging text message, and participle operation is carried out to the described first microblogging text message after mark, to obtain point set of words of corresponding all described first microblogging text messages, and obtain the corresponding proper vector of each described first microblogging text message according to described point of set of words;
3rd acquisition module, for obtaining maximum entropy classifiers according to the mood classification of described first microblogging text message and corresponding described proper vector;
Identification module, carries out Emotion identification for adopting described maximum entropy classifiers to testing material.
7. system according to claim 6, is characterized in that, described system also comprises:
Collection module, for collecting the second microblogging text message; Described second microblogging text message is all microblogging text messages that described microblog users is delivered.
8. system according to claim 7, is characterized in that, described collection module comprises:
Construction unit, for building Subscriber Queue, and carries out initialization operation to described Subscriber Queue;
Choosing unit, for choosing any one microblog users as seed user, and the concern user of described seed user and described seed user and bean vermicelli user being added in described Subscriber Queue;
Placement unit, for choosing arbitrarily a microblog users from described Subscriber Queue, and capture by the application programming interface that microblogging provides the described second microblogging text message that described microblog users delivers, and add the concern user of described microblog users and bean vermicelli user to described Subscriber Queue;
Judging unit, for judging whether the quantity of the described microblog users captured reaches preset value;
Collector unit, during for reaching preset value when the described microblog users quantity captured, then stops collecting described second microblogging text message.
9. system according to claim 8, is characterized in that, described first acquisition module comprises:
Acquiring unit, for obtaining the type field value in described application programming interface that described microblogging provides;
Screening unit, during for equaling 1 when the described the type field value obtained, then filter out and equal 1 corresponding described second microblogging text message with the type field value in described application programming interface, and described second microblogging text message is labeled as described first microblogging text message.
10. system according to claim 9, is characterized in that, described identification module comprises:
Taxon, for adopting described maximum entropy classifiers to classify to described testing material, and obtains corresponding classification results;
Statistic unit, for adding up described classification results, and obtains corresponding statistics;
Judging unit, for judging that according to described statistics whether described testing material is containing being in a bad mood.
CN201510236384.1A 2015-05-11 2015-05-11 Method and system for identifying micro-blog textual emotion Pending CN104809104A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510236384.1A CN104809104A (en) 2015-05-11 2015-05-11 Method and system for identifying micro-blog textual emotion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510236384.1A CN104809104A (en) 2015-05-11 2015-05-11 Method and system for identifying micro-blog textual emotion

Publications (1)

Publication Number Publication Date
CN104809104A true CN104809104A (en) 2015-07-29

Family

ID=53693935

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510236384.1A Pending CN104809104A (en) 2015-05-11 2015-05-11 Method and system for identifying micro-blog textual emotion

Country Status (1)

Country Link
CN (1) CN104809104A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105205043A (en) * 2015-08-26 2015-12-30 苏州大学张家港工业技术研究院 Classification method and system of emotions of news readers
CN106484861A (en) * 2016-10-08 2017-03-08 珠海格力电器股份有限公司 The method and apparatus of pushed information
CN106598944A (en) * 2016-11-25 2017-04-26 中国民航大学 Civil aviation security public opinion emotion analysis method
CN109508380A (en) * 2018-03-25 2019-03-22 哈尔滨工程大学 A kind of method that combination user structure similarity carries out microblog emotional analysis
CN109918556A (en) * 2019-03-08 2019-06-21 北京工业大学 A kind of comprehensive microblog users social networks and microblogging text feature depressive emotion recognition methods
CN112052869A (en) * 2020-07-14 2020-12-08 北京工业大学 User psychological state identification method and system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104199981A (en) * 2014-09-24 2014-12-10 苏州大学 Method and system for classifying persons and mechanisms based on microblog texts
CN104598648A (en) * 2015-02-26 2015-05-06 苏州大学 Interactive gender identification method and device for microblog user

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104199981A (en) * 2014-09-24 2014-12-10 苏州大学 Method and system for classifying persons and mechanisms based on microblog texts
CN104598648A (en) * 2015-02-26 2015-05-06 苏州大学 Interactive gender identification method and device for microblog user

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
何跃等: "中文微博的情绪识别与分类研究", 《情报杂志》 *
庞磊等: "基于情绪知识的中文微博情感分类方法", 《计算机工程》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105205043A (en) * 2015-08-26 2015-12-30 苏州大学张家港工业技术研究院 Classification method and system of emotions of news readers
CN106484861A (en) * 2016-10-08 2017-03-08 珠海格力电器股份有限公司 The method and apparatus of pushed information
CN106598944A (en) * 2016-11-25 2017-04-26 中国民航大学 Civil aviation security public opinion emotion analysis method
CN106598944B (en) * 2016-11-25 2019-03-19 中国民航大学 A kind of civil aviaton's security public sentiment sentiment analysis method
CN109508380A (en) * 2018-03-25 2019-03-22 哈尔滨工程大学 A kind of method that combination user structure similarity carries out microblog emotional analysis
CN109508380B (en) * 2018-03-25 2021-07-16 哈尔滨工程大学 Method for analyzing microblog emotion by combining user structure similarity
CN109918556A (en) * 2019-03-08 2019-06-21 北京工业大学 A kind of comprehensive microblog users social networks and microblogging text feature depressive emotion recognition methods
CN109918556B (en) * 2019-03-08 2021-06-25 北京工业大学 Method for identifying depressed mood by integrating social relationship and text features of microblog users
CN112052869A (en) * 2020-07-14 2020-12-08 北京工业大学 User psychological state identification method and system

Similar Documents

Publication Publication Date Title
CN104809104A (en) Method and system for identifying micro-blog textual emotion
CN106951925B (en) Data processing method, device, server and system
Alberto et al. Tubespam: Comment spam filtering on youtube
CN103793484B (en) The fraud identifying system based on machine learning in classification information website
CN103399891B (en) Method for automatic recommendation of network content, device and system
CN105447505B (en) A kind of multi-level important email detection method
CN104364781B (en) System and method for calculating classification ratio
US20140280173A1 (en) System and method for real-time dynamic measurement of best-estimate quality levels while reviewing classified or enriched data
CN104298679A (en) Application service recommendation method and device
CN109299258A (en) A kind of public sentiment event detecting method, device and equipment
CN108959329B (en) Text classification method, device, medium and equipment
CN106651574A (en) Personal credit assessment method and apparatus
CN102629904A (en) Detection and determination method of network navy
CN108052505A (en) Text emotion analysis method and device, storage medium, terminal
CN110610193A (en) Method and device for processing labeled data
CN104951542A (en) Method and device for recognizing class of social contact short texts and method and device for training classification models
CN108021651A (en) Network public opinion risk assessment method and device
CN104794241A (en) News classification method and system based on emotion tendentiousness
CN104598648A (en) Interactive gender identification method and device for microblog user
CN108734159A (en) The detection method and system of sensitive information in a kind of image
CN108733791A (en) network event detection method
CN105677925B (en) Database user data processing method and device
CN105609116A (en) Speech emotional dimensions region automatic recognition method
Reddy Fake profile identification using machine learning
CN111079029A (en) Sensitive account detection method, storage medium and computer equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20150729