CN103744838A - Chinese emotional abstract system and Chinese emotional abstract method for measuring mainstream emotional information - Google Patents

Chinese emotional abstract system and Chinese emotional abstract method for measuring mainstream emotional information Download PDF

Info

Publication number
CN103744838A
CN103744838A CN201410034395.7A CN201410034395A CN103744838A CN 103744838 A CN103744838 A CN 103744838A CN 201410034395 A CN201410034395 A CN 201410034395A CN 103744838 A CN103744838 A CN 103744838A
Authority
CN
China
Prior art keywords
sentence
emotion
phrase
evaluation object
array
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410034395.7A
Other languages
Chinese (zh)
Other versions
CN103744838B (en
Inventor
陈国龙
廖祥文
潘敏
郭文忠
魏晶晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN201410034395.7A priority Critical patent/CN103744838B/en
Publication of CN103744838A publication Critical patent/CN103744838A/en
Application granted granted Critical
Publication of CN103744838B publication Critical patent/CN103744838B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Document Processing Apparatus (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a Chinese emotional abstract system and a Chinese emotional abstract method for measuring mainstream emotional information. The system comprises a comment data preprocessing module for extracting various evaluation objects included in each sentence in the comment data and evaluation phrases corresponding to the evaluation objects, converting the evaluation objects and the evaluation phrases corresponding to the evaluation objects into a unit composed of the evaluation objects and the emotional intensity grades corresponding to the evaluation objects, establishing an evaluation object data structure corresponding to various evaluation objects and converting the comment data into a sentence set, a unit emotional information measurement module for calculating the amount of the emotional information of each unit, a sentence emotional information measurement module for calculating the amount of the emotional information of each sentence, and an emotional abstract generation module for sorting all the sentences according to high and low amounts of emotional information of the sentences and selecting the first k sentences to form the final emotional abstract. The system and the method are advantageous for extracting the emotional abstract including the mainstream emotional information from the comment data, and further high in accuracy and wide in application range.

Description

A kind of for measuring the Chinese emotion digest system and method for main flow emotion information
Technical field
The present invention relates to the emotion digest technical field of product scope, more specifically, relate to a kind ofly for measuring the Chinese emotion digest system and method for main flow emotion information, be applicable to conclude and sum up the comment data of each product, help user to understand fast the key message of product.
Background technology
Emotion digest is intended to extraction and has obvious tendentious emotion information.The tolerance of emotion information is to obtain the key step of emotion digest.In the prior art, there is the method for emotion information in some tolerance digests.But these method great majority are to measure emotion information according to evaluation object, evaluation word, polarity, are not sufficient to the emotion intensity of the information of showing emotion.This be because, if two sentences have same evaluation object, the polarity of evaluation object institute corresponding evaluation word is also the same, but polar intensity is different, the emotion power of the expressed viewpoint of reviewer is also different.
Meanwhile, in emotion digest, should comprise product attribute as much as possible and viewpoint thereof, and between digest sentence, redundant information should be the least possible, i.e. diversity.At present, exist certain methods for solving text digest diverse problems.Wan etc. propose the method based on manifold-ranking, first according to manifold-ranking algorithm, calculate the degree of correlation between sentence and inquiry, select the highest sentence of rank to put in summary, then add penalty, the overlapping value between the remaining sentence of tolerance and digest sentence.Fukumoto etc. propose to adopt the method for spectral clustering to replace K-means algorithm, and realize the object of dimensionality reduction denoising, and make classification more accurate, thus the accuracy that improves digest result.Yan etc. are converted into two similarity problems between words distribution multifarious problem, according to the method for Kullback-Leibler divergence, measure.But above method is applicable to traditional documents digest mostly, but for the emotion digest of product scope, user's more attention is reviewer's viewpoint expressed to product attribute.
Therefore,, around these two problems, introduce polar intensity, simultaneously in conjunction with Emotion elements such as evaluation object, evaluation phrases, analyze the impact of these Emotion elements on emotion power, diverse problems in emotion information, propose corresponding solution, to improve the precision of Chinese emotion digest.
Summary of the invention
The object of the present invention is to provide a kind ofly for measuring the Chinese emotion digest system and method for main flow emotion information, this system and method is conducive to from comment data to extract the emotion digest that comprises main flow emotion information, and accuracy is high, wide accommodation.
For achieving the above object, technical scheme of the present invention is: a kind of for measuring the Chinese emotion digest system of main flow emotion information, this system comprises:
Comment data pretreatment module, each evaluation object comprising for the each sentence of extracting comment data and corresponding evaluation phrase thereof, be converted into the unit consisting of evaluation object and corresponding emotion strength grade thereof, described emotion strength grade calculates by evaluating accordingly phrase, set up the evaluation object data structure corresponding with each evaluation object, described evaluation object data structure comprises following information: evaluation object, the set that corresponding all evaluation phrases form in comment data of this evaluation object evaluates phrase set, the number of times that this evaluation object occurs, the first array, the second array, the 3rd array and the 4th array, described the first array, the second array, element in the 3rd array is corresponding one by one with the element in the set of evaluation phrase respectively, in the first array each element representation corresponding evaluate the number of times that phrase occurs in comment data, in the second array each element representation the corresponding number of times of evaluating phrase and this evaluation object co-occurrence in comment data, in the 3rd array each element representation the corresponding emotion strength grade of evaluating phrase, the 4th array comprises n element, represent the emotion information amount between this evaluation object and n class emotion strength grade, comment data is changed into sentence set, each element in described sentence set is corresponding with the each sentence in comment data, and the each element in sentence set comprises following information: the set that all units that this sentence position, the content of this sentence, the classification of this sentence, this sentence in comment data comprises form, the emotion information amount of this sentence,
Unit emotion information metric module, for calculating the emotion information amount of each unit: the evaluation object data structure building using comment data pretreatment module is as input, for each evaluation object, according to emotion strength grade difference, to evaluating phrase, classify, then calculate evaluation object and each class and evaluate the emotion information amount of phrase, obtain the emotion information amount between evaluation object and n class emotion strength grade, finally obtain the emotion information amount of whole units;
Sentence emotion information metric module, for calculating the emotion information amount of each sentence: using sentence set and unit emotion information metric module evaluation object data structure after treatment as input, first utilize clustering algorithm to classify to all sentences, make the similar sentence cluster of content, obtain the classification of each sentence, the unit that the emotion information amount of each classification comprises according to each classification calculates, the unit that correlation degree between sentence and classification also comprises according to sentence and classification calculates, the distance that correlation degree between sentence comprises between unit according to sentence is calculated, last iteration is asked for the emotion information amount of each sentence, and
Emotion digest generation module, for generating emotion digest: using the sentence set after treatment of sentence emotion information metric module as input, according to the emotion information amount size of sentence, all sentences are sorted, before selecting, k sentence forms final emotion digest.
Further, described comment data pretreatment module comprises parser and abstraction module, described parser is for resolving comment data, comment data is carried out to subordinate sentence, syntactic structure is analyzed, described abstraction module adopts rule-based result of parser being resolved without measure of supervision to process, extract evaluation object and corresponding evaluation phrase thereof, composition < evaluation object, evaluate phrase > couple, then use emotion strength grade module to calculate and evaluate emotion strength grade corresponding to phrase, by < evaluation object, evaluate phrase > to being converted into the unit being formed by evaluation object and corresponding emotion strength grade thereof, and set up described evaluation object data structure and sentence set.
Further, described emotion strength grade module is calculated the emotion strength grade of evaluating phrase as follows: evaluate phrase and form by evaluating word and modifying adverbial word, from sentiment dictionary, obtain the polar intensity of evaluating word, and according to evaluating word and modifying the relation between adverbial word, form certain assessment rules, then according to described assessment rules, calculate the polar intensity of evaluating phrase, and turn to n emotion strength grade by discrete the polar intensity of evaluating phrase, and then obtain evaluating the emotion strength grade of phrase.
It is a kind of for measuring the Chinese emotion abstract method of main flow emotion information that the present invention also provides, and the method comprises the following steps:
Each evaluation object that in step (1) comment data pretreatment module extracting comment data, each sentence comprises and corresponding evaluation phrase thereof, be converted into the unit consisting of evaluation object and corresponding emotion strength grade thereof, described emotion strength grade calculates by evaluating accordingly phrase, set up the evaluation object data structure corresponding with each evaluation object, described evaluation object data structure comprises following information: evaluation object, the set that corresponding all evaluation phrases form in comment data of this evaluation object, the number of times that this evaluation object occurs, the first array, the second array, the 3rd array and the 4th array, described the first array, the second array, element in the 3rd array is corresponding one by one with the element in the set of evaluation phrase respectively, in the first array each element representation corresponding evaluate the number of times that phrase occurs in comment data, in the second array each element representation the corresponding number of times of evaluating phrase and this evaluation object co-occurrence in comment data, in the 3rd array each element representation the corresponding emotion strength grade of evaluating phrase, the 4th array comprises n element, represent the emotion information amount between this evaluation object and n class emotion strength grade, comment data is changed into sentence set, each element in described sentence set is corresponding with the each sentence in comment data, and the each element in sentence set comprises following information: the set that all units that this sentence position, the content of this sentence, the classification of this sentence, this sentence in comment data comprises form, the emotion information amount of this sentence,
Step (2) unit emotion information metric module receives the evaluation object data structure that comment data pretreatment module builds, calculate the emotion information amount of each unit: for each evaluation object, according to emotion strength grade difference, to evaluating phrase, classify, then calculate evaluation object and each class and evaluate the emotion information amount of phrase, obtain the emotion information amount between evaluation object and n class emotion strength grade, finally obtain the emotion information amount of whole units;
Step (3) sentence emotion information metric module receives sentence set and unit emotion information metric module evaluation object data structure after treatment, calculate the emotion information amount of each sentence: first utilize clustering algorithm to classify to all sentences, make the similar sentence cluster of content, obtain the classification of each sentence, the unit that the emotion information amount of each classification comprises according to each classification calculates, the unit that correlation degree between sentence and classification also comprises according to sentence and classification calculates, the distance that correlation degree between sentence comprises between unit according to sentence is calculated, last iteration is asked for the emotion information amount of each sentence,
Step (4) emotion digest generation module receives the sentence set after treatment of sentence emotion information metric module, according to the emotion information amount size of sentence, all sentences is sorted, and before selecting, k sentence forms final emotion digest.
Further, in described step (1), as follows by the method for evaluating phrase calculating emotion strength grade: to evaluate phrase and form by evaluating word and modifying adverbial word, from sentiment dictionary, obtain the polar intensity of evaluating word, and according to evaluating word and modifying the relation between adverbial word, form certain assessment rules, then according to described assessment rules, calculate the polar intensity of evaluating phrase, and turn to n emotion strength grade by discrete the polar intensity of evaluating phrase, and then obtain evaluating the emotion strength grade of phrase.
The invention has the beneficial effects as follows and proposed a kind of Chinese emotion digest system and method towards product scope, this system and method can be efficient, from comment data, extract exactly the emotion digest that comprises main flow emotion information, meet main flow, diversity, redundancy requirement, the viewpoint that is the each product attribute in emotion digest is that most of reviewers approve of, the content of emotion digest comprises product attribute as much as possible and viewpoint thereof, in emotion digest, between each sentence, redundant information is the least possible, result of use is good, there is very strong practicality and wide application prospect.
Accompanying drawing explanation
Fig. 1 is the structural representation of system of the present invention.
Embodiment
Below in conjunction with drawings and the specific embodiments, the present invention is described in further detail.
The Chinese emotion digest system of the present invention for measuring main flow emotion information, as shown in Figure 1, comprises comment data pretreatment module, unit emotion information metric module, sentence emotion information metric module and emotion digest generation module.
Described comment data pretreatment module, each evaluation object that in extracting comment data, each sentence comprises and corresponding evaluation phrase thereof, be converted into the unit consisting of evaluation object and corresponding emotion strength grade thereof, described emotion strength grade calculates by evaluating accordingly phrase, set up the evaluation object data structure corresponding with each evaluation object, described evaluation object data structure comprises following information: evaluation object, the set (evaluating phrase set) that corresponding all evaluation phrases form in comment data of this evaluation object, the number of times that this evaluation object occurs, the first array, the second array, the 3rd array and the 4th array, described the first array, the second array, element in the 3rd array is corresponding one by one with the element in the set of evaluation phrase respectively, in the first array each element representation corresponding evaluate the number of times that phrase occurs in comment data, in the second array each element representation the corresponding number of times of evaluating phrase and this evaluation object co-occurrence in comment data, in the 3rd array each element representation the corresponding emotion strength grade of evaluating phrase, the 4th array comprises 5 elements, represent the emotion information amount between this evaluation object and 5 class emotion strength grades, comment data is changed into sentence set, each element in described sentence set is corresponding with the each sentence in comment data, and the each element in sentence set comprises following information: the set that all units that this sentence position, the content of this sentence, the classification of this sentence, this sentence in comment data comprises form, the emotion information amount of this sentence.In evaluation object data structure, the each element in evaluation object and the 3rd array, forms 1 unit.The 4th array is for the emotion information amount of records appraisal object and every class emotion strength grade (being unit).
Described comment data pretreatment module comprises parser and abstraction module, described parser is for resolving comment data, comment data is carried out to subordinate sentence, syntactic structure is analyzed, described abstraction module adopts rule-based result of parser being resolved without measure of supervision to process, extract evaluation object and corresponding evaluation phrase thereof, composition < evaluation object, evaluate phrase > couple, then use emotion strength grade module to calculate and evaluate emotion strength grade corresponding to phrase, by < evaluation object, evaluate phrase > to being converted into the unit being formed by evaluation object and corresponding emotion strength grade thereof, and set up described evaluation object data structure and sentence set.
Described emotion strength grade module is calculated the emotion strength grade of evaluating phrase as follows: evaluate phrase and form by evaluating word and modifying adverbial word, modify adverbial word and be divided into degree adverb and negative adverb, can play increase to the polar intensity of evaluating word, reduce, put anti-effect, from already present sentiment dictionary (SentiWordnet 1.0 editions), obtain the polar intensity of evaluating word, and according to evaluating word and modifying the relation between adverbial word, form certain assessment rules, then according to described assessment rules, calculate the polar intensity of evaluating phrase, the scope of polar intensity is [1, 1] between, in order to express more accurately semanteme, by evaluating, the polar intensity of phrase is discrete turns to 5 emotion strength grades, and then obtain evaluating the emotion strength grade of phrase.
Described unit emotion information metric module, calculate the emotion information amount of each unit: the evaluation object data structure building using comment data pretreatment module is as input, for each evaluation object, according to emotion strength grade difference, to evaluating phrase, classify, then calculate evaluation object and each class and evaluate the emotion information amount of phrase, obtain the emotion information amount between evaluation object and 5 class emotion strength grades, finally obtain the emotion information amount of whole units.
The unit's of further describing emotion information metric module is the emotion information amount of unit of account how below.Main thought is to express the correlation degree between evaluation object and emotion strength grade according to pointwise Mutual Information.Pointwise interactive information value is higher, and the correlation degree between two persons is larger, and pointwise interactive information value is lower, and correlation degree is lower.With the height of correlation degree between the two, represent the size of emotion information amount.
In comment data, evaluation object t, its corresponding phrase book of evaluating is combined into E.Evaluate phrase set and be divided into m evaluation phrase subset, i.e. m class according to emotion strength grade.If the pointwise interactive information value of the evaluation phrase subset of evaluation object t and k class is large, the correlation degree of evaluation object t and k class evaluation phrase subset is large so, and the corresponding emotion strength grade of k class is exactly the emotion intensity that in language material, most of reviewers express evaluation object t.
Concrete steps:
1. traversal comment data, adds up the number Targetnum that each evaluation object occurs, each evaluates the appearance number P hrasesnum of phrase, and each < evaluation object is evaluated the number Tpnum that phrase > occurs;
2. each I (t of initialization i, P j)=0;
3.for is (for each evaluation object t i)
Calculate the probability of evaluation object in comment data:
Figure 2014100343957100002DEST_PATH_IMAGE002
;
(each in evaluation phrase set corresponding to ti is evaluated phrase e to for j)
Calculate and evaluate the probability of phrase in comment data:
Figure 2014100343957100002DEST_PATH_IMAGE004
;
Calculate evaluation object and evaluate the probability of phrase in comment data:
Figure 2014100343957100002DEST_PATH_IMAGE006
;
Calculate emotion strength grade:
Figure 2014100343957100002DEST_PATH_IMAGE008
;
Calculate the emotion information amount between evaluation object and emotion strength grade:
Figure 2014100343957100002DEST_PATH_IMAGE010
;
4. calculate the emotion information amount of each unit.
Described sentence emotion information metric module, calculate the emotion information amount of each sentence: first do 2 hypothesis, suppose that the unit that the emotion information amount of sentence comprises with it is relevant, the emotion information amount of unit is more, the emotion information amount of sentence is more, the emotion information amount of simultaneously supposing sentence is not only relevant with the sentence of its association, and relevant with the classification under it, if the sentence that sentence and more susceptible sense quantity of information are high is relevant, and the emotion information amount of classification is also high under its, to comprise main flow emotion information amount high for this sentence so.Using sentence set and unit emotion information metric module evaluation object data structure after treatment as input, first utilize clustering algorithm to classify to all sentences, make the similar sentence cluster of content, obtain the classification of each sentence, the unit that the emotion information amount of each classification comprises according to each classification calculates, the unit that correlation degree between sentence and classification also comprises according to sentence and classification calculates, the distance that correlation degree between sentence comprises between unit according to sentence is calculated, last iteration is asked for the emotion information amount of each sentence,
Further describing sentence emotion information metric module is below the emotion information amount of how to measure sentence.Mainly according to 2 hypothesis, suppose that the unit that the emotion information amount of 1 sentence comprises with it is relevant, the emotion information amount of unit is more, and the emotion information amount of sentence is more.The emotion information amount of supposing 2 sentences is not only relevant with the sentence of its association, and relevant with the classification under it.If the sentence that sentence and more susceptible sense quantity of information are high is relevant, and under its, the emotion information amount of classification is also high, and to comprise main flow emotion information amount high for this sentence so.First utilize clustering algorithm to classify to all sentences, the similar sentence of content is gathered at same class, obtain the classification of each sentence.The unit that the emotion information amount of each classification comprises according to it calculates, and the unit that the correlation degree between sentence and classification also comprises according to them calculates, and the correlation degree between sentence calculates according to the distance between comprised unit.Last iteration is asked for the emotion information amount of each sentence;
Concrete steps:
1. pair evaluating data carries out cluster, class set: c1 ... ck} and corresponding relation: ci=clus (s i);
2. calculate the emotion information amount ratio of unit that each class comprises and unit that comment data comprises
Figure 2014100343957100002DEST_PATH_IMAGE012
3. calculate the emotion information amount ratio of unit that each sentence comprises and unit that corresponding class comprises
Figure 2014100343957100002DEST_PATH_IMAGE014
4. for is (for each sentence s in comment data iincidence matrix M between) // ask sentence and sentence
{
Similarity between // calculating sentence si and sentence sj, count function representation t xthe number of times occurring in comment data
For is (for each sentence s in comment data j)
If (whole elements that matrix i is capable and be not 0) is normalized;
}
5. // calculate the emotion information amount of each sentence, senscore value
do{
The initial emotion information amount senscore value that each sentence is set is 1;
According to associated sentence (being worth non-vanishing on incidence matrix) and corresponding class, obtain sentence
Emotion information amount
While (until all the senscore value of sentence no longer changes)
6. obtain the emotion information amount senscore value of all sentences
Described emotion digest generation module, using the sentence set after treatment of sentence emotion information metric module as input, sorts to all sentences according to the emotion information amount size of sentence, and before selecting, k sentence forms final emotion digest.
Correspondingly, it is a kind of for measuring the Chinese emotion abstract method of main flow emotion information that the present invention also provides, and the method comprises the following steps:
Each evaluation object that in step (1) comment data pretreatment module extracting comment data, each sentence comprises and corresponding evaluation phrase thereof, be converted into the unit consisting of evaluation object and corresponding emotion strength grade thereof, described emotion strength grade calculates by evaluating accordingly phrase, set up the evaluation object data structure corresponding with each evaluation object, described evaluation object data structure comprises following information: evaluation object, the set (evaluating phrase set) that corresponding all evaluation phrases form in comment data of this evaluation object, the number of times that this evaluation object occurs, the first array, the second array, the 3rd array and the 4th array, described the first array, the second array, element in the 3rd array is corresponding one by one with the element in the set of evaluation phrase respectively, in the first array each element representation corresponding evaluate the number of times that phrase occurs in comment data, in the second array each element representation the corresponding number of times of evaluating phrase and this evaluation object co-occurrence in comment data, in the 3rd array each element representation the corresponding emotion strength grade of evaluating phrase, the 4th array comprises 5 elements, represent the emotion information amount between this evaluation object and 5 class emotion strength grades, comment data is changed into sentence set, each element in described sentence set is corresponding with the each sentence in comment data, and the each element in sentence set comprises following information: the set that all units that this sentence position, the content of this sentence, the classification of this sentence, this sentence in comment data comprises form, the emotion information amount of this sentence.
In step (1), as follows by the method for evaluating phrase calculating emotion strength grade: to evaluate phrase and form by evaluating word and modifying adverbial word, modify adverbial word and be divided into degree adverb and negative adverb, can play increase to the polar intensity of evaluating word, reduce, put anti-effect, from already present sentiment dictionary (SentiWordnet 1.0 editions), obtain the polar intensity of evaluating word, and according to evaluating word and modifying the relation between adverbial word, form certain assessment rules, then according to described assessment rules, calculate the polar intensity of evaluating phrase, the scope of polar intensity is [1, 1] between, in order to express more accurately semanteme, by evaluating, the polar intensity of phrase is discrete turns to 5 emotion strength grades, and then obtain evaluating the emotion strength grade of phrase.
Step (2) unit emotion information metric module receives the evaluation object data structure that comment data pretreatment module builds, calculate the emotion information amount of each unit: for each evaluation object, according to emotion strength grade difference, to evaluating phrase, classify, then calculate evaluation object and each class and evaluate the emotion information amount of phrase, obtain the emotion information amount between evaluation object and 5 class emotion strength grades, finally obtain the emotion information amount of whole units.
Step (3) sentence emotion information metric module receives sentence set and unit emotion information metric module evaluation object data structure after treatment, calculate the emotion information amount of each sentence: first utilize clustering algorithm to classify to all sentences, make the similar sentence cluster of content, obtain the classification of each sentence, the unit that the emotion information amount of each classification comprises according to each classification calculates, the unit that correlation degree between sentence and classification also comprises according to sentence and classification calculates, the distance that correlation degree between sentence comprises between unit according to sentence is calculated, last iteration is asked for the emotion information amount of each sentence.
Step (4) emotion digest generation module receives the sentence set after treatment of sentence emotion information metric module, according to the emotion information amount size of sentence, all sentences is sorted, and before selecting, k sentence forms final emotion digest.

Claims (5)

1. for measuring a Chinese emotion digest system for main flow emotion information, it is characterized in that, this system comprises:
Comment data pretreatment module, each evaluation object comprising for the each sentence of extracting comment data and corresponding evaluation phrase thereof, be converted into the unit consisting of evaluation object and corresponding emotion strength grade thereof, described emotion strength grade calculates by evaluating accordingly phrase, set up the evaluation object data structure corresponding with each evaluation object, described evaluation object data structure comprises following information: evaluation object, the set that corresponding all evaluation phrases form in comment data of this evaluation object evaluates phrase set, the number of times that this evaluation object occurs, the first array, the second array, the 3rd array and the 4th array, described the first array, the second array, element in the 3rd array is corresponding one by one with the element in the set of evaluation phrase respectively, in the first array each element representation corresponding evaluate the number of times that phrase occurs in comment data, in the second array each element representation the corresponding number of times of evaluating phrase and this evaluation object co-occurrence in comment data, in the 3rd array each element representation the corresponding emotion strength grade of evaluating phrase, the 4th array comprises n element, represent the emotion information amount between this evaluation object and n class emotion strength grade, comment data is changed into sentence set, each element in described sentence set is corresponding with the each sentence in comment data, and the each element in sentence set comprises following information: the set that all units that this sentence position, the content of this sentence, the classification of this sentence, this sentence in comment data comprises form, the emotion information amount of this sentence,
Unit emotion information metric module, for calculating the emotion information amount of each unit: the evaluation object data structure building using comment data pretreatment module is as input, for each evaluation object, according to emotion strength grade difference, to evaluating phrase, classify, then calculate evaluation object and each class and evaluate the emotion information amount of phrase, obtain the emotion information amount between evaluation object and n class emotion strength grade, finally obtain the emotion information amount of whole units;
Sentence emotion information metric module, for calculating the emotion information amount of each sentence: using sentence set and unit emotion information metric module evaluation object data structure after treatment as input, first utilize clustering algorithm to classify to all sentences, make the similar sentence cluster of content, obtain the classification of each sentence, the unit that the emotion information amount of each classification comprises according to each classification calculates, the unit that correlation degree between sentence and classification also comprises according to sentence and classification calculates, the distance that correlation degree between sentence comprises between unit according to sentence is calculated, last iteration is asked for the emotion information amount of each sentence, and
Emotion digest generation module, for generating emotion digest: using the sentence set after treatment of sentence emotion information metric module as input, according to the emotion information amount size of sentence, all sentences are sorted, before selecting, k sentence forms final emotion digest.
2. according to claim 1 a kind of for measuring the Chinese emotion digest system of main flow emotion information, it is characterized in that, described comment data pretreatment module comprises parser and abstraction module, described parser is for resolving comment data, comment data is carried out to subordinate sentence, syntactic structure is analyzed, described abstraction module adopts rule-based result of parser being resolved without measure of supervision to process, extract evaluation object and corresponding evaluation phrase thereof, composition < evaluation object, evaluate phrase > couple, then use emotion strength grade module to calculate and evaluate emotion strength grade corresponding to phrase, by < evaluation object, evaluate phrase > to being converted into the unit being formed by evaluation object and corresponding emotion strength grade thereof, and set up described evaluation object data structure and sentence set.
3. a kind of for measuring the Chinese emotion digest system of main flow emotion information according to shown in claim 2, it is characterized in that, described emotion strength grade module is calculated the emotion strength grade of evaluating phrase as follows: evaluate phrase and form by evaluating word and modifying adverbial word, from sentiment dictionary, obtain the polar intensity of evaluating word, and according to evaluating word and modifying the relation between adverbial word, form certain assessment rules, then according to described assessment rules, calculate the polar intensity of evaluating phrase, and the polar intensity of phrase is discrete turns to n emotion strength grade by evaluating, and then obtain evaluating the emotion strength grade of phrase.
4. for measuring a Chinese emotion abstract method for main flow emotion information, it is characterized in that, the method comprises the following steps:
Each evaluation object that in step (1) comment data pretreatment module extracting comment data, each sentence comprises and corresponding evaluation phrase thereof, be converted into the unit consisting of evaluation object and corresponding emotion strength grade thereof, described emotion strength grade calculates by evaluating accordingly phrase, set up the evaluation object data structure corresponding with each evaluation object, described evaluation object data structure comprises following information: evaluation object, the set that corresponding all evaluation phrases form in comment data of this evaluation object, the number of times that this evaluation object occurs, the first array, the second array, the 3rd array and the 4th array, described the first array, the second array, element in the 3rd array is corresponding one by one with the element in the set of evaluation phrase respectively, in the first array each element representation corresponding evaluate the number of times that phrase occurs in comment data, in the second array each element representation the corresponding number of times of evaluating phrase and this evaluation object co-occurrence in comment data, in the 3rd array each element representation the corresponding emotion strength grade of evaluating phrase, the 4th array comprises n element, represent the emotion information amount between this evaluation object and n class emotion strength grade, comment data is changed into sentence set, each element in described sentence set is corresponding with the each sentence in comment data, and the each element in sentence set comprises following information: the set that all units that this sentence position, the content of this sentence, the classification of this sentence, this sentence in comment data comprises form, the emotion information amount of this sentence,
Step (2) unit emotion information metric module receives the evaluation object data structure that comment data pretreatment module builds, calculate the emotion information amount of each unit: for each evaluation object, according to emotion strength grade difference, to evaluating phrase, classify, then calculate evaluation object and each class and evaluate the emotion information amount of phrase, obtain the emotion information amount between evaluation object and n class emotion strength grade, finally obtain the emotion information amount of whole units;
Step (3) sentence emotion information metric module receives sentence set and unit emotion information metric module evaluation object data structure after treatment, calculate the emotion information amount of each sentence: first utilize clustering algorithm to classify to all sentences, make the similar sentence cluster of content, obtain the classification of each sentence, the unit that the emotion information amount of each classification comprises according to each classification calculates, the unit that correlation degree between sentence and classification also comprises according to sentence and classification calculates, the distance that correlation degree between sentence comprises between unit according to sentence is calculated, last iteration is asked for the emotion information amount of each sentence,
Step (4) emotion digest generation module receives the sentence set after treatment of sentence emotion information metric module, according to the emotion information amount size of sentence, all sentences is sorted, and before selecting, k sentence forms final emotion digest.
5. according to claim 4 a kind of for measuring the Chinese emotion abstract method of main flow emotion information, it is characterized in that, in described step (1), as follows by the method for evaluating phrase calculating emotion strength grade: to evaluate phrase and form by evaluating word and modifying adverbial word, from sentiment dictionary, obtain the polar intensity of evaluating word, and according to evaluating word and modifying the relation between adverbial word, form certain assessment rules, then according to described assessment rules, calculate the polar intensity of evaluating phrase, and the polar intensity of phrase is discrete turns to n emotion strength grade by evaluating, and then obtain evaluating the emotion strength grade of phrase.
CN201410034395.7A 2014-01-24 2014-01-24 A kind of Chinese emotion digest system and method for measuring main flow emotion information Active CN103744838B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410034395.7A CN103744838B (en) 2014-01-24 2014-01-24 A kind of Chinese emotion digest system and method for measuring main flow emotion information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410034395.7A CN103744838B (en) 2014-01-24 2014-01-24 A kind of Chinese emotion digest system and method for measuring main flow emotion information

Publications (2)

Publication Number Publication Date
CN103744838A true CN103744838A (en) 2014-04-23
CN103744838B CN103744838B (en) 2016-09-07

Family

ID=50501856

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410034395.7A Active CN103744838B (en) 2014-01-24 2014-01-24 A kind of Chinese emotion digest system and method for measuring main flow emotion information

Country Status (1)

Country Link
CN (1) CN103744838B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105912644A (en) * 2016-04-08 2016-08-31 国家计算机网络与信息安全管理中心 Network review generation type abstract method
CN106372058A (en) * 2016-08-29 2017-02-01 中译语通科技(北京)有限公司 Short text emotion factor extraction method and device based on deep learning
CN107767195A (en) * 2016-08-16 2018-03-06 阿里巴巴集团控股有限公司 The display systems and displaying of description information, generation method and electronic equipment
CN107967260A (en) * 2017-12-07 2018-04-27 东软集团股份有限公司 A kind of data processing method, equipment, system and computer program product
CN108268439A (en) * 2016-12-30 2018-07-10 北京国双科技有限公司 The processing method and processing device of text emotion
CN110110193A (en) * 2019-04-24 2019-08-09 北京百炼智能科技有限公司 A kind of information processing method, device and computer readable storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003281161A (en) * 2002-03-19 2003-10-03 Seiko Epson Corp Information classification method, information classification device, program and record medium
CN103064971A (en) * 2013-01-05 2013-04-24 南京邮电大学 Scoring and Chinese sentiment analysis based review spam detection method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003281161A (en) * 2002-03-19 2003-10-03 Seiko Epson Corp Information classification method, information classification device, program and record medium
CN103064971A (en) * 2013-01-05 2013-04-24 南京邮电大学 Scoring and Chinese sentiment analysis based review spam detection method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
XIAOJUN WAN 等: "Multi-Document Summarization Using Cluster-Based Link Analysis", 《PROCEEDINGS OF THE 31ST ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL》, 20 July 2008 (2008-07-20), pages 299 - 306, XP058244115, DOI: doi:10.1145/1390334.1390386 *
张晓甜 等: "基于树结构模式挖掘的非监督中文短语结构句法分析", 《中国计算语言学研究前沿进展(2009-2011)》, 20 August 2011 (2011-08-20), pages 106 - 111 *
潘敏 等: "基于极性强度度量中文情感文摘中的情感信息", 《集美大学学报(自然科学版)》, vol. 18, no. 6, 25 November 2013 (2013-11-25), pages 461 - 466 *
郑敏洁 等: "中文句子评价对象抽取的特征分析研究", 《福州大学学报(自然科学版)》, vol. 40, no. 5, 9 October 2012 (2012-10-09), pages 584 - 590 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105912644A (en) * 2016-04-08 2016-08-31 国家计算机网络与信息安全管理中心 Network review generation type abstract method
CN107767195A (en) * 2016-08-16 2018-03-06 阿里巴巴集团控股有限公司 The display systems and displaying of description information, generation method and electronic equipment
CN106372058A (en) * 2016-08-29 2017-02-01 中译语通科技(北京)有限公司 Short text emotion factor extraction method and device based on deep learning
CN106372058B (en) * 2016-08-29 2019-10-15 中译语通科技股份有限公司 A kind of short text Emotional Factors abstracting method and device based on deep learning
CN108268439A (en) * 2016-12-30 2018-07-10 北京国双科技有限公司 The processing method and processing device of text emotion
CN108268439B (en) * 2016-12-30 2021-09-07 北京国双科技有限公司 Text emotion processing method and device
CN107967260A (en) * 2017-12-07 2018-04-27 东软集团股份有限公司 A kind of data processing method, equipment, system and computer program product
CN107967260B (en) * 2017-12-07 2021-09-14 东软集团股份有限公司 Data processing method, device, system and computer readable medium
CN110110193A (en) * 2019-04-24 2019-08-09 北京百炼智能科技有限公司 A kind of information processing method, device and computer readable storage medium

Also Published As

Publication number Publication date
CN103744838B (en) 2016-09-07

Similar Documents

Publication Publication Date Title
CN103049435B (en) Text fine granularity sentiment analysis method and device
CN104391942B (en) Short essay eigen extended method based on semantic collection of illustrative plates
CN103744838A (en) Chinese emotional abstract system and Chinese emotional abstract method for measuring mainstream emotional information
CN104951548A (en) Method and system for calculating negative public opinion index
CN105205124B (en) A kind of semi-supervised text sentiment classification method based on random character subspace
CN110378409A (en) It is a kind of based on element association attention mechanism the Chinese get over news documents abstraction generating method
CN105550269A (en) Product comment analyzing method and system with learning supervising function
CN103646088A (en) Product comment fine-grained emotional element extraction method based on CRFs and SVM
CN107992542A (en) A kind of similar article based on topic model recommends method
CN101127042A (en) Sensibility classification method based on language model
CN108763402A (en) Class center vector Text Categorization Method based on dependence, part of speech and semantic dictionary
CN104933027A (en) Open Chinese entity relation extraction method using dependency analysis
CN103034626A (en) Emotion analyzing system and method
CN104794208A (en) Sentiment classification method and system based on contextual information of microblog text
Valakunde et al. Multi-aspect and multi-class based document sentiment analysis of educational data catering accreditation process
CN101520802A (en) Question-answer pair quality evaluation method and system
CN109670039A (en) Sentiment analysis method is commented on based on the semi-supervised electric business of tripartite graph and clustering
CN106202481A (en) The evaluation methodology of a kind of perception data and system
CN109325114A (en) A kind of text classification algorithm merging statistical nature and Attention mechanism
CN107273913A (en) A kind of short text similarity calculating method based on multi-feature fusion
CN101702167A (en) Method for extracting attribution and comment word with template based on internet
CN102880631A (en) Chinese author identification method based on double-layer classification model, and device for realizing Chinese author identification method
CN103020167A (en) Chinese text classification method for computer
Yan et al. An improved single-pass algorithm for chinese microblog topic detection and tracking
Malandrakis et al. SAIL: A hybrid approach to sentiment analysis

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant