CN103995853A

CN103995853A - Multi-language emotional data processing and classifying method and system based on key sentences

Info

Publication number: CN103995853A
Application number: CN201410198519.5A
Authority: CN
Inventors: 程学旗; 林政�; 张瑾; 谭松波; 徐学可
Original assignee: Institute of Computing Technology of CAS
Current assignee: Institute of Computing Technology of CAS
Priority date: 2014-05-12
Filing date: 2014-05-12
Publication date: 2014-08-20

Abstract

The invention discloses a multi-language emotional data processing and classifying method and system based on key sentences. The method includes the steps that first, an emotional dictionary data packet is automatically extracted from an unlabelled emotional data set, and the polarity of emotional words is finally judged through a K nearest neighbor algorithm and a voting rule; second, the extracted emotional dictionary data packet is used for calculating the score of the emotion attribute, then, the position attribute and the key word attribute are comprehensively considered, and a plurality of emotional key sentences are extracted for each text; third, the extracted emotional key sentences are directly applied to supervised emotional data classification and unsupervised emotional data classification. Therefore, the double-difficulty problem caused by language migration and emotional data analysis in the multi-language translation process can be solved, and emotional data analysis accuracy can be improved.

Description

Multilingual emotion data processing and classifying method and system based on key sentences

Technical Field

The invention relates to text emotion data analysis, in particular to a multilingual emotion data processing and classifying method and system based on key sentences.

Background

With the continuous emergence of network communication platforms such as forums, blogs, comments, microblogs and the like, people are more and more accustomed to publishing subjective comments on the internet, wherein the comments are used for expressing the opinions and opinions of people on daily events, products, policies and the like. Meanwhile, with the acceleration of the globalization process, the information resources provided by the network present the characteristic of multi-linguishing. Emotion classification is a classification task that divides text into commendably and inversely according to the expressed emotion polarity; multilingual emotion classification refers to emotion classification of other languages using the source language. The multi-language emotion classification aims to research viewpoints, opinions and attitudes contained in multi-language emotion texts by means of minimum resources, and not only can make reasonable purchasing decisions by referring to evaluation of global users on commodities, but also can more timely understand network ideas of countries all over the world.

At present, multilingual emotion data analysis mainly faces two difficult problems, namely, the two difficult problems of language migration and emotion data analysis in the cross-language translation process.

For language migration, the following two methods are mainly adopted:

and performing cross-language emotion data classifier migration by means of a statistical machine translation system. On one hand, a marked source language data set can be translated into a target language, and then a classifier is trained on the translated training corpus to judge the test set; alternatively, the target language test set may be translated into the source language and then directly applied to the classifier trained on the source language. However, the accuracy of cross-language emotion analysis is lost with machine translation based methods. On the one hand, machine translation systems generate unique solutions, so the translation is not necessarily correct; on the other hand, machine translation systems rely on training sets and perform poorly when the domain of the target language differs significantly from the training set.

And performing migration of the cross-language emotion data classifier by means of a bilingual dictionary. In supervised learning, an emotion data classifier can be learned in a source language, and then a bilingual dictionary is used for translating a feature space into a target language; in unsupervised learning, the emotion dictionary in the source language can be translated into the target language through a bilingual dictionary. However, most bilingual dictionary-based works do not consider the contextual dependency of emotional words when selecting translated words. In addition, the polarity (support or opposition) of emotion words has domain dependency, and different polarities are presented in the face of different entities, so that using a general emotion dictionary for a specific domain tends to be poor in performance.

For emotion data analysis, the following three methods are mainly used:

in the supervised learning method, the emotion tendency analysis of the text can be regarded as a text classification process, and the text tendency is judged by means of machine learning methods such as naive Bayes, maximum entropy, support vector machine and the like. Based on the machine learning method, feature fusion or feature reduction can be carried out to further improve the performance of emotion data classification.

In the unsupervised learning method, emotion data analysis is performed without any labeled data. The classical way is that: firstly, part-of-speech tagging is carried out on a text, some collocations of adjectives and adverbs are selected according to a predefined rule, then the difference between mutual information of each collocation and a pair of opposite-polarity emotional words, such as excelent and poror, is calculated, and finally the mutual information differences of all collocations of one text are summed to judge the emotional category.

In the semi-supervised learning method, a large amount of unlabelled data is combined with a small amount of labeled data. The semi-supervised learning can reduce the dependence of the supervised learning on the labeled samples, can obtain better performance than the unsupervised learning, and is a compromise method.

However, the conventional emotion analysis method does not solve the problem of interference of emotion ambiguity in the comment text on emotion data classification. The emotion data classification is somewhat similar to the plain text classification, but more complex than the plain text classification. In topic-based text classification, because word usage is different between texts with different topics, the domain relevance of words enables the texts with different topics to be well distinguished. However, the emotion data classification is much less accurate than the topic-based text classification, which is mainly caused by the complex emotion expression and the large amount of emotional ambiguity in the emotion text. In addition, in an article, objective sentences and subjective sentences may be interlaced, or a subjective sentence has more than two emotions at the same time, so text emotion data classification is a very complicated task. Here, taking a book review on a network as an example:

"many people say this is a sad, overflowing story, perhaps it is this comment that I have not been courage to read seriously. Though people who fall into a popular set are refused to shake and are extremely easy to deepen, the people are willing to see a beautiful large-volume ending in emotion, and the communication is so fragile and impatient in display.

… … this book, i am a good one to see, and is very much like. "

The author uses a large number of passive words to describe the feelings before reading, such as "sadness" and "fragility", but at the end of the article, the author expresses that he likes the book with a very positive attitude. In this example, the polarity of the entire text is positive, but it is easily discriminated as negative due to the presence of a large number of negative words. When the polarity of the whole article is judged, the emotion contribution degrees of all sentences in the article are different, and if the emotion expression key sentences and the sentences for describing details are distinguished, the text emotion data classification performance is improved.

In summary, the following two problems mainly exist in the multi-language emotional orientation analysis:

(1) multi-language emotion analysis over-depends on external resources

Most multilingual emotion analysis techniques rely on machine translation or bilingual dictionaries. Without a machine translation system or a compiled bilingual dictionary, the multilingual emotion analysis work is difficult.

(2) Multilingual emotion analysis is susceptible to interference from emotional ambiguities

In an article, objective sentences and subjective sentences may be interlaced, or a subjective sentence has more than two emotions, so text emotion data classification is a very complicated task.

(3) Multi-language emotion analysis performance difference humanity

Emotional expressions in different languages vary widely, and there is a loss of information when a model derived from the original space is converted to the target language space. For example, machine translation systems only generate unique solutions, and methods based on machine translation lose the accuracy of cross-language emotion analysis.

Disclosure of Invention

In order to solve the above problems, the present invention provides a method and a system for classifying language-independent multi-language emotion data, so as to solve the problem of dual difficulties of language migration and emotion analysis in the cross-language translation process. The method has less resource dependence, can be easily transplanted to a multi-language scene, and can grasp the most main view of an author through a key sentence extraction module so as to improve the accuracy of multi-language emotion data classification.

In order to achieve the purpose, the invention provides a multilingual emotion data processing and classifying method based on key sentences, which is characterized by comprising the following steps of:

step 1, automatically extracting a part of emotion dictionary data packet from an unlabeled emotion data set, and finally judging the polarity of emotion words through a K neighbor algorithm and a voting rule;

step 2, calculating scores of emotion attributes by using the extracted emotion dictionary data packets, then comprehensively considering position attributes and keyword attributes, and automatically extracting a plurality of emotion key sentences for each text;

and 3, directly applying the extracted emotion key sentence to supervised emotion data classification and unsupervised emotion data classification.

The invention discloses a multilingual emotion data processing and classifying method based on key sentences, which is characterized in that the step 1 comprises the following steps:

step 21, taking Chinese as an example, extracting emotional words XX from the whole data set according to pattern matching of 'very XX' and 'very XX';

step 22, taking mutual information as similarity measurement, and assigning an emotion polarity for each emotion word according to a K neighbor algorithm;

and step 23, optimizing the designated emotion polarity through a voting principle.

The invention discloses a multilingual emotion data processing and classifying method based on key sentences, which is characterized in that the step 2 comprises the following steps:

step 31, calculating the emotion score of each sentence according to the extracted emotion dictionary data packet;

step 32, calculating the position score of each sentence according to Gaussian distribution;

step 33, calculating the keyword score of each sentence according to the keyword list;

and step 34, carrying out weighted summation on the emotion scores, the position scores and the keyword scores, and determining the last N sentences as emotion key sentences.

The invention discloses a multilingual emotion data processing and classifying method based on key sentences, which is characterized in that the step 3 comprises the following steps:

unsupervised sentiment data classification: each text is replaced by a plurality of emotion key sentences, and then the polarity of each text is judged on the key sentences by using extracted emotion dictionary data packets;

supervised emotion data classification: and selecting the most confident sample from the unlabeled samples as an labeled set according to the scores of the positive-type emotion words and the negative-type emotion words respectively, then training an emotion data classifier, and finally judging the polarity of each article on the key sentence.

The invention also relates to a multilingual emotion data processing and classifying system based on the key sentences, which is characterized by comprising the following steps:

the polarity judgment module is used for automatically extracting a part of emotion dictionary data packet from the unlabeled emotion data set and finally judging the polarity of the emotion words through a K neighbor algorithm and a voting rule;

the key sentence extraction module is used for calculating the score of the emotion attribute through the extracted emotion dictionary data packet, then comprehensively considering the position attribute and the keyword attribute and automatically extracting a plurality of emotion key sentences for each text;

and the emotion data classification module is used for directly applying the extracted emotion key sentences to supervised emotion data classification and unsupervised emotion data classification.

The multilingual emotion data processing and classifying system based on key sentences is characterized in that the polarity judgment module comprises:

the emotion word extraction module is used for extracting the emotion words XX from the whole data set according to pattern matching of 'very XX' and 'very XX' by taking Chinese as an example;

the polarity endowing module is used for taking mutual information as similarity measurement and appointing an emotion polarity for each emotion word according to a K neighbor algorithm;

and the polarity optimization module is used for optimizing the designated emotion polarity through a voting principle.

The multilingual emotion data processing and classifying system based on key sentences is characterized in that the key sentence extraction module comprises:

the emotion score calculation module is used for calculating the emotion score of each sentence according to the extracted emotion dictionary data packet;

the position score calculating module is used for calculating the position score of each sentence according to Gaussian distribution;

the keyword score calculation module is used for calculating the keyword score of each sentence according to the keyword list;

and the key sentence determining module is used for carrying out weighted summation on the emotion scores, the position scores and the keyword scores and determining the last N sentences as the emotion key sentences.

The invention discloses a multilingual emotion data processing and classifying system based on key sentences, which is characterized in that an emotion data classifying module comprises:

the unsupervised emotion data classification module is used for replacing each text with a plurality of emotion key sentences and judging the polarity of each text on the key sentences by using extracted emotion dictionary data packets;

and the supervised emotion data classification module is used for selecting the most confident sample from the unlabeled samples according to the scores of the positive and negative emotion words respectively as an labeled set, then training an emotion data classifier, and finally judging the polarity of each article on the key sentence.

The invention has the beneficial effects that: the method for analyzing the orientation of the multiple languages, provided by the invention, is language-independent, does not need a machine translation system and a large-scale bilingual dictionary data packet, directly learns the emotion data classifier on the target language, and has less resource dependence. Moreover, the invention also solves the problem that the emotion data classification is easily interfered by emotional ambiguity, and the key sentence extraction module is used for grasping the most main viewpoints of the author and neglecting unimportant viewpoints, thereby improving the performance of the emotion data classification. The present invention is superior to other unsupervised methods. The extracted sentiment dictionary data packet is better than the extracted sentiment dictionary data packet for classifying the full text, so that the sentiment data classification based on the key sentence is higher than the sentiment data classification based on the full text, and the effectiveness of the key sentence extraction algorithm provided by the invention is proved.

Drawings

FIG. 1 is a schematic diagram of the process of the present invention;

FIG. 2 is a graph of a standard Gaussian distribution.

Detailed Description

The invention relates to a multilingual emotion data processing and classifying method based on key sentences, which comprises the following steps:

step 1, automatically extracting an emotion dictionary data packet (binary group data such as 'good positive class' and 'poor negative class') from an unlabeled emotion corpus database. The polarity (positive or negative) of the emotional words is determined by the K-nearest neighbor algorithm and voting rules. In the voting rule, the invention also introduces a suspension mechanism to prevent the polarity from being overused;

step 2, calculating scores of emotion attributes by using the extracted emotion dictionary data packets, then comprehensively considering position attributes and keyword attributes, and automatically extracting a plurality of emotion key sentences for each text as a representative of each text;

and 3, directly applying the extracted emotion key sentence to supervised emotion data classification and unsupervised emotion data classification to obtain the emotion polarity of each text.

Taking book comments as an example, the emotion key sentence extraction module can obtain a key sentence which is a book and is completely read at one stroke, and is very popular to replace the whole viewpoint of the whole comment. Then, through querying the previously acquired emotion dictionary data packet, it is known that the key sentence contains the emotion word "like" and the polarity of the "like" is positive, so that the emotion polarity of the book comment is determined to be positive.

The step 1 comprises the following steps:

first, taking Chinese as an example, the emotion words XX are extracted from the entire data set according to pattern matching "very XX" and "very XX".

And secondly, taking mutual information as similarity measurement, and assigning an emotion polarity for each emotion word according to a K-nearest neighbor algorithm.

And finally, optimizing the designated emotion polarity through a voting principle.

The step 2 comprises the following steps:

first, an emotion score is calculated for each sentence from the extracted emotion dictionary data packet.

Next, a position score of each sentence is calculated from the gaussian distribution.

Again, a keyword score is calculated for each sentence from the keyword list.

And finally, carrying out weighted summation on the emotion scores, the position scores and the keyword scores, and determining the N sentences with the highest scores as emotion key sentences.

The step 3 comprises the following steps:

unsupervised sentiment data classification: each text is replaced by a plurality of emotion key sentences, and then the polarity of each text is judged on the key sentences by using the extracted emotion dictionary data packet.

Supervised emotion data classification: and selecting the most confident sample from the unlabeled data set as the labeled set according to the scores of the positive-type emotion words and the negative-type emotion words respectively, then training an emotion data classifier, and finally judging the polarity of each article on the key sentence.

To prove the effectiveness of the proposed method, the invention was experimented on multi-domain (books, movies, music) review datasets in multiple languages (english, french, german).

In order to verify the validity of the voting rule, the emotion dictionary correctness before and after applying the voting rule was manually checked, and the result is shown in table 1.

TABLE 1 polarity determination accuracy of English emotional words

English	Before voting	After voting
			Book with detachable cover	0.6931	0.8053
Film	0.7263	0.7835
			Music	0.7512	0.7708
Average	0.7235	0.7865

As can be seen from Table 1, after the voting rule is applied, the accuracy of the English emotion dictionary data packet is improved by 6.3 percentage points on average. For general emotional words, the voting rule enables the polarity judgment accuracy to be higher through minority obeying majority, and for domain-dependent emotional words, the suspension mechanism can prevent the emotional polarity from being excessively corrected.

In order to verify the effectiveness of the key sentence extraction algorithm, the emotion data classification method based on the key sentences is compared with other reference methods respectively, and experiments are carried out on data sets of different languages, and the results are shown in tables 2-4.

TABLE 2 English sentiment data classification accuracy

TABLE 3 French sentiment data Classification accuracy

TABLE 4 German Emotion data Classification accuracy

From tables 2-4, it can be seen that the method of the present invention is superior to other unsupervised methods in both multiple languages and multiple domains. The extracted sentiment dictionary data packet is better than the extracted sentiment dictionary data packet for classifying the full text, so that the sentiment data classification based on the key sentence is higher than the sentiment data classification based on the full text, and the key sentence extraction algorithm provided by the invention is proved to be effective. The core idea of the invention is to analyze the tendentiousness of a completely unknown language with the least resources (priori knowledge), automatically learn the emotion data classifier on the target language data set, and grasp the most important viewpoint of the author through the extraction module of the key sentence, and ignore the interference of unimportant viewpoints.

FIG. 1 is a flowchart of a sentiment data classification method. As shown in fig. 1, the method includes:

step 1, automatically extracting a part of emotion dictionary data packets from an unlabeled emotion data set, and finally judging the polarity (positive type or negative type) of each emotion word through a K neighbor algorithm and a voting rule.

In the polarity judgment based on the K-nearest neighbor algorithm, the emotion polarity of a word is determined by the polarities of K words which are combined with the word most closely, namely, the similarity between the two words is maximum (o)_i，o_j) Measured by mutual information:

similarity (o_{i}, o_{j}) = \log \frac{p (o_{i}, o_{j})}{p (o_{i}) p (o_{j})};

wherein o is_iAnd o_jRespectively represent two different emotional words, and p is probability.

In order to further optimize the polarity judgment result based on the K-nearest neighbor algorithm, the polarity is secondarily judged by adopting a voting rule. In the voting rule, a hanging mechanism is introduced to utilize the results generated by the three fields respectively. One field is selected as a main field, the other two fields are selected as auxiliary fields, and the voting rules are as follows:

(1) if the polarity results of the emotion words generated by the three fields are consistent, the polarity is determined.

(2) If there is one auxiliary domain that generates emotion words with the same polarity as the main domain, the polarity is determined.

(3) If the emotion words generated by the two auxiliary domains have the same polarity and the result generated by the main domain is different, the polarity is suspended.

The suspension mechanism is introduced to prevent the emotional word polarity from being over-corrected, because the emotional word polarity is domain dependent, such as "big" may be fair in hotel domain and may be devastating in electronic domain, so the main domain decision result is still credible although the main domain decision result is different from the other domain decision results. For the suspended emotion words, the polarity thereof is finally designated by comparing the emotion word score of the main domain and the emotion score sum of the two subsidiary domains.

And 2, comprehensively considering the emotion attributes, the position attributes and the keyword attributes, and automatically extracting the emotion key sentences.

Given an article, the scores of 3 attributes are calculated for each sentence respectively, then weighted summation is carried out, and the sentence with the highest score is selected as the emotion key sentence.

It is known that arbitrary text d consists of a series of sentences: d ═ s₁,s₂,…,s_mWhere m represents the number of sentences, and each sentence s_iIs composed of a series of words s_i＝{wi₁,wi₂,…,wi_nWhere n represents the number of words. The final score for each sentence can be expressed as a weighted sum of 3 attributes:

f(s_i)＝λ₁*f_sentiment(s_i)+λ₂*f_position(s_i)+λ₃*f_keyword(s_i)；

where λ 1, λ 2, λ 3 are the weights of each attribute, obtained by maximizing the precision of the classifier, f _ sense(s)_i) As a sentence s_iThe emotion score of (1), f _ position(s)_i) As a sentence s_iF _ keyword(s)_i) As a sentence s_iThe keyword score of (1).

Emotional characteristics: the emotion key sentence mainly expresses the overall view or preference of an author, and the view and the preference are usually embodied by emotion words. The emotion attribute is used for examining whether a sentence has emotion colors or not and measuring the emotion importance degree of the sentence, and an emotion score function f _ present(s) is as follows:

<math> <mrow> <mi>f</mi> <mo>_</mo> <mi>sentiment</mi> <mrow> <mo>(</mo> <mi>s</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <munder> <mi>Σ</mi> <mrow> <mi>t</mi> <mo>&Element;</mo> <mi>s</mi> </mrow> </munder> <mi>opinion</mi> <mo>_</mo> <mi>lexicon</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> </mrow> <mrow> <munder> <mi>Σ</mi> <mrow> <mi>t</mi> <mo>&Element;</mo> <mi>s</mi> </mrow> </munder> <mo>|</mo> <mi>opinon</mi> <mo>_</mo> <mi>lexicon</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>|</mo> </mrow> </mfrac> <mo>;</mo> </mrow> </math>

wherein opinion _ lexicon (t) not only identifies whether the word t in the sentence s is an emotion word, but also marks the polarity of the emotion word. If t is a recognition, then opinion _ lexicon (t) is 1; if t is a derogative word, then opinion _ lexicon (t) is-1. As can be seen from the formula, the score is higher only when a sentence contains emotional words with the same polarity, and the score is lower if emotional words with different polarities are contained at the same time.

Position characteristics: in order to effectively extract the main viewpoints from the internet user comments, the ending part of the article needs to be particularly emphasized. The present invention considers the beginning and ending sentences of the article as important. The position attribute ensures that sentences at the beginning and the end of the article become sentences with key sentences with scores larger than those in the middle of the article, and the position score function f _ position(s) is defined as follows:

<math> <mrow> <mi>f</mi> <mo>_</mo> <mi>position</mi> <mrow> <mo>(</mo> <mi>s</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>-</mo> <mfrac> <mn>1</mn> <mrow> <msqrt> <mn>2</mn> <mi>π</mi> </msqrt> <mi>σ</mi> </mrow> </mfrac> <msup> <mi>e</mi> <mrow> <mo>-</mo> <mfrac> <msup> <mrow> <mo>(</mo> <mi>s</mi> <mo>-</mo> <mi>μ</mi> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mrow> <mn>2</mn> <msup> <mi>σ</mi> <mn>2</mn> </msup> </mrow> </mfrac> </mrow> </msup> <mo>,</mo> <mn>1</mn> <mo>≤</mo> <mi>s</mi> <mo>≤</mo> <mi>len</mi> <mo>;</mo> </mrow> </math>

the position scoring function is actually a negative gaussian distribution probability density function, where μ is the mean, σ is the variance, and len represents the text length (i.e., the number of sentences of a text). In fact, the function f _ position(s) is a parabola with an upward opening, the abscissa represents the position of a sentence, the range of values is 1 to len, and the ordinate represents the score of the sentence at the position as an emotion key sentence. It is not difficult to see the negative form of gaussian distribution (only one curve in mathematical sense is seen, and no other part of the relation with the present invention is seen), the sentence in the middle of the article is at the lowest point of the curve, the score of the middle sentence as the emotion key sentence is smaller, and the scores of the sentences at the beginning and the end are higher. The standard gaussian distribution is shown in fig. 2.

Keyword characteristics: the emotion key sentences often contain some summarized words or phrases, such as "in summary" or "in summary," and the summarized keywords provide good heuristic information for the extraction of the emotion key sentences. The invention carries out word frequency statistics on the last sentence of all texts in the corpus, can sort to obtain a keyword list, if the keywords appear in a certain sentence, the probability that the sentence becomes a keyword is higher, therefore, the keyword score function f _ keyword(s) is defined as follows:

<math> <mrow> <mi>f</mi> <mo>_</mo> <mi>keyword</mi> <mrow> <mo>(</mo> <mi>s</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>Σ</mi> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <mi>keyword</mi> <mrow> <mo>(</mo> <msub> <mi>w</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math>

wherein,

<math> <mrow> <mi>keyword</mi> <mrow> <mo>(</mo> <msub> <mi>w</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfenced open='{' close=''> <mtable> <mtr> <mtd> <mn>1</mn> </mtd> <mtd> <msub> <mi>w</mi> <mi>j</mi> </msub> <mo>&NotElement;</mo> <mi>keywords</mi> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> <mtd> <msub> <mi>w</mi> <mi>j</mi> </msub> <mo>&NotElement;</mo> <mi>keywords</mi> </mtd> </mtr> </mtable> </mfenced> <mo>,</mo> </mrow> </math>

w_iare phrases that make up sentences.

The process of choosing training samples from unlabeled samples refers to the following equation:

POS = \frac{T^{P} - T^{N}}{T^{P} + T^{N}}, NEG = \frac{T^{N} - T^{P}}{T^{P} + T^{N}};

wherein, T^PShow a pieceNumber of positive emotion words in text, T^NThe number of negative-class emotional words in a text is represented, POS represents a positive class, NEG represents a negative class, and the invention considers that the greater the difference of the number of the positive-class emotional words and the number of the negative-class emotional words in the text is, the more definite the emotional tendency of the text is. To overcome the text length to T^PAnd T^NAnd normalizing the difference value through a denominator under the influence of the difference value.

Claims

1. A multilingual emotion data processing and classifying method based on key sentences is characterized by comprising the following steps:

2. The multi-lingual emotion data processing and classification method based on key sentences of claim 1, wherein step 1 comprises:

3. The multi-lingual emotion data processing classification method based on key sentences of claim 1, wherein step 2 includes:

and step 34, carrying out weighted summation on the emotion scores, the position scores and the keyword scores, and determining the N sentences with the highest scores as emotion key sentences.

4. The multi-lingual emotion data processing classification method based on key sentences as recited in claim 1, wherein step 3 comprises:

5. A multilingual emotion data processing and classifying system based on key sentences is characterized by comprising the following components:

6. The keyword sentence-based multilingual emotion data processing and classification system of claim 5, wherein the polarity determination module comprises:

7. The multi-lingual emotion data processing classification system based on key sentences of claim 5, wherein the key sentence extraction module includes:

and the key sentence determining module is used for carrying out weighted summation on the emotion scores, the position scores and the keyword scores and determining the N sentences with the highest scores as the emotion key sentences.

8. The multi-lingual emotion data processing classification system based on key sentences of claim 5, wherein the emotion data classification module includes: