CN111159410A

CN111159410A - Text emotion classification method, system and device and storage medium

Info

Publication number: CN111159410A
Application number: CN201911410177.8A
Authority: CN
Inventors: 寇永娴; 占太雄; 陈惠芳; 黄娇燕; 余嘉昇
Original assignee: GRG Banking Equipment Co Ltd; GRG Banking IT Co Ltd
Current assignee: GRG Banking Equipment Co Ltd; GRG Banking IT Co Ltd
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2020-05-15

Abstract

The invention discloses a text emotion classification method, a system, a device and a storage medium, wherein the method comprises the following steps: preprocessing the text; carrying out statistic calculation on the preprocessed text to obtain a text vector; selecting the features of the text vectors by adopting a chi-square statistical method, and extracting the feature vectors; carrying out weight calculation on the feature vectors to obtain the weight of each feature vector; and combining the weights of the characteristic vectors, and classifying the texts based on a support vector machine. The system comprises: the device comprises a preprocessing module, a statistic module, a characteristic module, a weight module and a classification module. The device comprises a memory and a processor for executing the text emotion classification method. By using the method and the device, the accuracy of text classification can be improved. The method, the system, the device and the storage medium for text emotion classification can be widely applied to the field of text classification.

Description

Text emotion classification method, system and device and storage medium

Technical Field

The invention relates to the field of text classification, in particular to a text emotion classification method, a text emotion classification system, a text emotion classification device and a storage medium.

Background

Emotion classification is a task in the field of natural language processing, also known as trend analysis, which is a process of analyzing, processing, generalizing, and reasoning subjective text with emotional colors. The method can analyze emotional preference and viewpoint of an author to a specific subject in a text, is used for predicting film box houses, stock trends, public sentiment analysis, improving services and products, knowing user experience and the like, and the main research methods of text emotion classification at present are based on a dictionary and a corpus, information mining is carried out on the corpus or the dictionary, the emotional tendency of words is recognized, so that statistical data is obtained and the polarity of the words is judged, but the two methods have no part-of-speech distinguishing capability on new words, and the accuracy of the result obtained by classification is low because the judgment is not carried out from the semantic level.

Disclosure of Invention

In order to solve the above technical problems, an object of the present invention is to provide a method, a system, a device and a storage medium for text emotion classification, which can improve the accuracy of text classification.

The first technical scheme adopted by the invention is as follows: a text emotion classification method comprises the following steps:

preprocessing the text;

carrying out statistic calculation on the preprocessed text to obtain a text vector;

selecting the features of the text vectors by adopting a chi-square statistical method, and extracting the feature vectors;

carrying out weight calculation on the feature vectors to obtain the weight of each feature vector;

and combining the weights of the characteristic vectors, and classifying the texts based on a support vector machine.

Further, the step of preprocessing the text specifically includes:

obtaining a text, filtering illegal characters of the text and performing word segmentation processing on the text;

and removing irrelevant words and counting word frequency to obtain the preprocessed text.

Further, the feature selection of the text vector by using the chi-square statistical method specifically adopts the following formula:

said t is_iIs a feature item, said C_jIs a category, N is the total number of texts, A is the inclusion t_iAnd belong to C_jB is a number containing t_iBut not belonging to C_jIs the number of C_jBut does not contain t_iIs not C_jAnd does not contain t_iThe number of the cells.

Further, the weight calculation of the feature vectors to obtain the weight of each feature vector specifically adopts the following formula:

said w_ijRepresents a weight, said tf_ijRepresents t_iIn the number of occurrences of the text, n_iIndicates that t is included_iThe number of texts in (1).

Further, the performing weight calculation on the feature vectors to obtain the weight of each feature vector further includes performing normalization processing on the weight, specifically using the following formula:

the M represents a vector number.

Further, the step of selecting features of the text vector by using a chi-square statistical method to extract the feature vector specifically includes:

scoring the feature items of the text vector and sequencing the feature items according to the scoring size;

and obtaining text feature items according to a preset quantity, and extracting feature vectors of the text by adopting a chi-square statistical method.

Further, the irrelevant words include stop words, pronouns, quantifiers, auxiliary words, conjunctions, and vocabularies.

The second technical scheme adopted by the invention is as follows: a text sentiment classification system comprising:

the preprocessing module is used for preprocessing the text;

the statistical module is used for carrying out statistical calculation on the preprocessed text to obtain a text vector;

the characteristic module is used for selecting the characteristics of the text vectors by adopting a chi-square statistical method and extracting the characteristic vectors;

the weighting module is used for carrying out weighting calculation on the feature vectors to obtain the weight of each feature vector;

and the classification module is used for classifying the texts based on the support vector machine by combining the weight of each feature vector.

The third technical scheme adopted by the invention is as follows: a text emotion classification apparatus comprising:

at least one processor;

at least one memory for storing at least one program;

when executed by the at least one processor, cause the at least one processor to implement a method for emotion classification of text as described above.

The fourth technical scheme adopted by the invention is as follows: a storage medium having stored therein instructions executable by a processor, the storage medium comprising: the processor-executable instructions, when executed by the processor, are for implementing a text emotion classification method as described above.

The method, the system, the device and the storage medium have the advantages that: the text is expressed in a vector form, emotion classification of the text is realized by extracting the features of the text and performing weight calculation on the extracted features, and the text is classified by inputting a vector space model of the text into a support vector machine in combination with feature weights, so that the accuracy of the emotion classification of the text is improved.

Drawings

FIG. 1 is a flowchart of the steps of a method for classifying text emotion according to the present invention;

FIG. 2 is a block diagram of a text emotion classification system according to the present invention.

Detailed Description

The invention is described in further detail below with reference to the figures and the specific embodiments. The step numbers in the following embodiments are provided only for convenience of illustration, the order between the steps is not limited at all, and the execution order of each step in the embodiments can be adapted according to the understanding of those skilled in the art.

For example, in some comments aiming at products, the enterprise directly extracts comment texts of all users, and the method is used for carrying out sentiment classification on a large number of comment texts, so that the enterprise can quickly guide whether the users approve the products.

As shown in FIG. 1, the invention provides a text emotion classification method, which comprises the following steps:

s101, preprocessing a text;

specifically, the purpose of text preprocessing is to extract main content from a text corpus in a standard manner and remove information irrelevant to text emotion classification, the main operations include steps of filtering illegal characters, performing word segmentation processing, removing stop words and the like, and the words can be subjected to emotion identification after the word segmentation processing.

S102, carrying out statistic calculation on the preprocessed text to obtain a text vector;

specifically, the text is unstructured data and is composed of a large number of characters, and a computer cannot directly process data of character types, so that the content of a common text needs to be converted into a data form which can be read and understood by the computer, namely the text is formally represented.

S103, selecting features of the text vectors by adopting a chi-square statistical method, and extracting the feature vectors;

s104, carrying out weight calculation on the feature vectors to obtain the weight of each feature vector;

and S105, classifying the texts based on a support vector machine by combining the weight of each feature vector.

Specifically, the process of carrying out weight calculation on the feature vector, namely giving a certain weight according to the contribution degree of the feature item to classification, mainly uses a support vector machine to classify, is a binary classification model, and aims to find a hyperplane to segment samples, wherein the segmentation principle is interval maximization, and finally is converted into a convex quadratic programming problem to solve.

Further, as a preferred embodiment of the method, the step of preprocessing the text specifically includes:

Specifically, the text data for filtering the illegal characters is segmented, a series of long sentences are segmented into words, and the words can be subjected to emotion identification.

Further, as a preferred embodiment of the method, the following formula is specifically adopted for feature selection of the text vector by using a chi-square statistical method:

Specifically, the algorithm uses a chi-square statistical method for feature selection. Chi-square statistical method for measuring characteristics t_iAnd document class C_jThe higher the statistical value is, the more information content is contained, and the greater the correlation with the class is.

Further, as a preferred embodiment of the method, the weight calculation of the feature vectors is performed to obtain the weight of each feature vector by using the following formula:

Specifically, in the feature selection process, feature vectors which can represent text content most are selected, but the influence of the features on text classification is different, and it is necessary to weight the selected features, to give a larger weight to features with strong feature capability and a smaller weight to features with weak category distinguishing capability, so that noise can be effectively suppressed.

Further, as a preferred embodiment of the method, the calculating the weight of the feature vector to obtain the weight of each feature vector further includes normalizing the weight, specifically using the following formula:

the M represents a vector number.

Specifically, in order to eliminate the influence of the text length on the feature weight, the weight of the feature is normalized.

Further, as a preferred embodiment of the method, the step of selecting the feature of the text vector by using a chi-square statistical method to extract the feature vector specifically includes:

In particular, the number of features may reach several tens of thousands of dimensions, which not only makes the operation time long, but also greatly reduces the accuracy of classification. The feature selection is to select a small part of features from an original high-dimensional feature set as classification features of a classifier, score each feature through a constructed evaluation function in the feature selection process, sort the feature vectors in a descending order according to the score, and finally select a certain number of features as a classification feature set

Further preferred as an embodiment of the method said irrelevant words comprise stop words, pronouns, quantifiers, co-words, conjunctions and vocabularies.

Specifically, the type of the irrelevant word can be set according to needs, and options such as prepositions, pure numbers and the like can be added.

The specific embodiment of the invention is as follows:

obtaining a comment text of a user, carrying out illegal character filtering and word segmentation processing on the comment text, removing irrelevant words to obtain main text data information, counting the times of the words appearing in the text, carrying out emotion identification on the words, combining a preprocessing result, word frequency information and emotion labels, carrying out feature selection on the text by using a chi-square statistical method, grading the features, carrying out descending order sorting on feature vectors according to the grading size, selecting the features according to a preset number, carrying out weight calculation on the selected features and normalizing the weight, finally representing the text in a vector space model mode, combining the normalized feature weight vectors, and classifying a large batch of texts by using a support vector machine classifier.

As shown in fig. 2, a text emotion classification system includes:

the preprocessing module is used for preprocessing the text;

As a further preferred embodiment of the present system, the preprocessing module further includes:

the word segmentation submodule is used for acquiring the text, filtering illegal characters of the text and carrying out word segmentation processing on the text;

the removing submodule is used for removing irrelevant words and counting word frequency to obtain a preprocessed text;

as a further preferred embodiment of the present system, the feature module further comprises:

the sorting submodule is used for grading the feature items of the text vector and sorting the feature items according to the grading size;

and the extraction submodule is used for obtaining the text feature items according to the preset quantity and extracting the feature vector of the text by adopting a chi-square statistical method.

The contents in the above method embodiments are all applicable to the present system embodiment, the functions specifically implemented by the present system embodiment are the same as those in the above method embodiment, and the beneficial effects achieved by the present system embodiment are also the same as those achieved by the above method embodiment.

An emotion classification device for authentication texts:

at least one processor;

at least one memory for storing at least one program;

The contents in the above method embodiments are all applicable to the present apparatus embodiment, the functions specifically implemented by the present apparatus embodiment are the same as those in the above method embodiments, and the advantageous effects achieved by the present apparatus embodiment are also the same as those achieved by the above method embodiments.

A storage medium having stored therein instructions executable by a processor, the storage medium comprising: the processor-executable instructions, when executed by the processor, are for implementing a text emotion classification method as described above.

The contents in the above method embodiments are all applicable to the present storage medium embodiment, the functions specifically implemented by the present storage medium embodiment are the same as those in the above method embodiments, and the advantageous effects achieved by the present storage medium embodiment are also the same as those achieved by the above method embodiments.

While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A text emotion classification method is characterized by comprising the following steps:

preprocessing the text;

2. The method for classifying emotion of text according to claim 1, wherein the step of preprocessing the text specifically includes:

3. The method for classifying emotion of text according to claim 1, wherein the feature selection of the text vector by using the chi-square statistical method specifically uses the following formula:

4. The method of claim 3, wherein the weight calculation of the feature vectors is performed to obtain the weight of each feature vector by using the following formula:

5. The method of classifying text emotions according to claim 4, wherein the calculating the weights of the feature vectors to obtain the weights of the feature vectors further comprises normalizing the weights, specifically using the following formula:

the M represents a vector number.

6. The method for classifying emotion of text according to claim 1, wherein said step of extracting feature vectors by selecting features of text vectors using chi-square statistical method specifically comprises:

7. The method for classifying emotion of text according to claim 1, wherein: the irrelevant words comprise stop words, pronouns, quantifiers, auxiliary words, conjunctions and vocabularies.

8. A text sentiment classification system, comprising:

the preprocessing module is used for preprocessing the text;

9. A text emotion classification device, characterized by further comprising:

at least one processor;

at least one memory for storing at least one program;

when executed by the at least one processor, cause the at least one processor to implement a method for emotion classification of text as claimed in any of claims 1 to 7.

10. A storage medium having stored therein instructions executable by a processor, the storage medium comprising: the processor-executable instructions, when executed by a processor, are for implementing a method for textual emotion classification as claimed in any of claims 1-7.