CN113672731A

CN113672731A - Emotion analysis method, device and equipment based on domain information and storage medium

Info

Publication number: CN113672731A
Application number: CN202110881327.4A
Authority: CN
Inventors: 张佳旭; 王宇琪; 郝保; 曹家; 刘莹; 鲁县华; 罗引; 王磊
Original assignee: Beijing Zhongke Wenge Technology Co ltd
Current assignee: Beijing Zhongke Wenge Technology Co ltd
Priority date: 2021-08-02
Filing date: 2021-08-02
Publication date: 2021-11-19
Anticipated expiration: 2041-08-02
Also published as: CN113672731B

Abstract

The embodiment of the disclosure relates to an emotion analysis method based on domain information, which comprises the following steps: preprocessing text information to be analyzed and acquiring field information; inputting text information to be analyzed into an emotion classification model fused with domain information in advance, and acquiring emotion types of the text information; the emotion classification model fused with domain information comprises: the method comprises the following steps of (1) a global emotion semantic model, a local emotion semantic model of each field and an emotion fusion strategy corresponding to each field; acquiring a local emotion probability value based on a local emotion semantic model matched with the field information; acquiring a global emotion probability value based on the global emotion semantic model; fusing the local emotion probability value and the global emotion probability value based on an emotion fusion strategy matched with the field information to obtain a fusion result; and acquiring the emotion category based on the fusion result. By the method, the text emotion analysis is carried out, the classification effect is better, and the accuracy of the analysis result is greatly improved.

Description

Emotion analysis method, device and equipment based on domain information and storage medium

Technical Field

The application belongs to the technical field of natural language processing, and particularly relates to a method, a device, equipment and a storage medium for emotion analysis based on domain information.

Background

With the development of the internet and the popularity of social networking and online shopping, users leave a large amount of text data on various network platforms. A large portion of this text has a subjective tendency to express the user's emotion to a particular entity, event, or himself. The emotion analysis can automatically mine and analyze emotion states in massive texts, and is widely applied to the fields of public opinion analysis, advertisement delivery or conversation robot design and the like.

One of the existing emotion analysis methods is to extract semantic information from a text and perform emotion classification by a machine learning method, where a typical method is to extract tf-idf features from the text and then identify emotion classes by using a machine learning classifier. However, words in a sentence are not piled up, different syntax can bring completely different emotional expressions, and some simple statistical features are not ideal for the emotional classification effect of the text. And the other method is to adopt a deep learning method to carry out emotion analysis through a convolutional neural network or a cyclic neural network.

The method aims at the problem that the emotion judgment is wrong for the texts in different contexts, for example, the emotion corresponding to the text containing the "fire" is judged as the negative emotion, but the "fire" corresponds to the negative emotion in the public safety field, and corresponds to the positive emotion in the fast-moving domain. Therefore, the problem of emotion judgment errors easily occurs in text emotion analysis results obtained by the existing text emotion analysis method, and the classification effect is poor and the accuracy is not high.

Disclosure of Invention

Technical problem to be solved

In view of the above disadvantages and shortcomings of the prior art, the present application provides a method, an apparatus, a device and a storage medium for emotion analysis based on domain information.

(II) technical scheme

In order to achieve the purpose, the technical scheme is as follows:

in a first aspect, the present application provides an emotion analysis method based on domain information, including:

preprocessing the text information to be analyzed, and acquiring the field information of the preprocessed text information;

inputting the text information to be analyzed into an emotion classification model fused with domain information in advance, and acquiring the emotion type of the text information; the emotion classification model fused with domain information comprises: the method comprises the following steps of (1) a global emotion semantic model, a local emotion semantic model of each field and an emotion fusion strategy corresponding to each field;

acquiring a local emotion probability value of the text information to be analyzed based on a local emotion semantic model matched with the field information;

acquiring a global emotion probability value of the text information to be analyzed based on a global emotion semantic model;

fusing the local emotion probability value and the global emotion probability value based on an emotion fusion strategy matched with the field information to obtain a fusion result;

and acquiring the emotion type of the text information based on the fusion result.

Optionally, the preprocessing the text information to be analyzed includes:

removing irregular information in the text information, and performing word segmentation processing;

searching index information of each word in the data after word segmentation processing based on a pre-established dictionary, and acquiring a word vector matrix formed by the index information of the words aiming at each sentence; the words in the pre-established dictionary consist of verbs and adjectives;

judging whether the dimensionality of the word vector matrix conforms to a preset sample length max-length;

and if not, acquiring a word vector matrix consistent with the sample length max-length according to the sample length processing rule.

Optionally, the obtaining of the domain information of the preprocessed text information includes:

inputting the preprocessed text information into a trained domain text classification model to obtain the domain information;

the field text classification model is constructed and trained on the basis of a text convolutional neural network (TextCNN).

Optionally, before the preprocessed text information is input into the trained domain text classification model and the domain information is obtained, the method further includes:

the method comprises the steps of constructing a field text classification model by adopting a text convolution neural network (TextCNN), wherein the width of a convolution kernel used in the TextCNN is consistent with the dimension of a word vector in a word vector matrix, the heights of the convolution kernels used in the TextCNN are respectively 2,3 and 4, and a pooling layer in the TextCNN adopts maximum pooling operation and average pooling operation to pool each feature vector after the convolution layer is processed into a value.

Optionally, inputting the preprocessed text information into a trained domain text classification model to obtain the domain information, where the obtaining includes:

mapping each word into a word vector with a preset length of Embedding-size in the Embedding layer to obtain a plurality of matrixes with the shapes of max-length and Embedding-size;

extracting the semantic features of n-gram phrases of the matrix by adopting convolution kernels with different sizes, wherein n is 2,3 and 4;

sequentially performing maximum pooling operation and average pooling on the n-gram phrase semantic features, and splicing various pooled numerical values to serve as a pooled n-gram semantic feature;

and inputting the semantic features of the pooled n-gram into the softmax layer to obtain the domain information.

Optionally, before the text information to be analyzed is input to the emotion classification model pre-fused with domain information, the method further includes:

constructing an emotion classification model fused with domain information;

constructing a global emotion semantic model in an emotion classification model fused with domain information based on a bidirectional conversion coding model, and obtaining a global emotion probability value;

constructing a local emotion semantic model of each field in an emotion classification model fused with field information based on a soft interval support vector machine algorithm, and obtaining a local emotion probability value;

and performing weighted fusion on the local emotion probability value and the global emotion probability value to serve as an emotion fusion strategy of each field in the emotion classification model fused with the domain information.

Optionally, the fusion result comprises a negative emotion probability value and/or a positive emotion probability value;

based on the fusion result, obtaining the emotion classification of the text information, including:

when the negative emotion probability value is larger than 0.5, judging that the semantic emotion of the text information is negative, otherwise, judging that the semantic emotion of the text information is positive; or

And when the positive emotion probability value is more than 0.5, judging that the semantic emotion of the text information is positive, otherwise, judging that the semantic emotion of the text information is negative.

training the emotion classification model fused with the domain information, wherein the steps comprise:

respectively training the local emotion semantic model by taking a domain text data set of a target domain as a training set aiming at different target domains;

training the global emotion semantic model by taking a text data set of all target fields as a training set;

and respectively carrying out weighted fusion on the trained local emotion semantic model and the global emotion semantic model aiming at different target fields, and determining the global emotion semantic model weight and the local emotion semantic model weight under different target fields through model verification.

Optionally, training the local emotion semantic model by using a domain text data set of a target domain as a training set, including:

acquiring a field text data set of a target field as a training sample;

calculating the tf-idf value of each word in the field text data set to generate a tf-idf vector of a training sample;

and modeling the tf-idf vector by using a soft interval support vector machine algorithm to obtain a local emotion semantic model.

In a second aspect, the present application provides an emotion analysis apparatus based on domain information, the apparatus including:

the domain information acquisition module is used for preprocessing the text information to be analyzed and acquiring the domain information of the preprocessed text information;

the emotion classification module is used for inputting the text information to be analyzed into an emotion classification model fused with domain information in advance to acquire emotion types of the text information; the emotion classification model fused with domain information comprises: the method comprises the following steps of (1) a global emotion semantic model, a local emotion semantic model of each field and an emotion fusion strategy corresponding to each field;

In a third aspect, the present application provides an electronic device, comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program when executed by the processor implementing the steps of the domain information based sentiment analysis method of any one of the first aspect above.

In a fourth aspect, the present application provides a computer-readable storage medium having a computer program stored thereon, where the computer program is executed by a processor to implement the steps of the domain information-based sentiment analysis method according to any one of the first aspect.

(III) advantageous effects

The technical scheme provided by the application can comprise the following beneficial effects: the method comprises the steps of firstly preprocessing text information to be analyzed, and acquiring field information; and then inputting the text information to be analyzed into an emotion classification model fused with domain information in advance to acquire the emotion type of the text information. The emotion classification model fused with domain information comprises: the method comprises a global emotion semantic model, a local emotion semantic model of each field and an emotion fusion strategy corresponding to each field. By means of the method, the emotion semantics of the field and the global emotion semantics are fused, accuracy of emotion analysis algorithms is greatly improved, and classification effect is good.

Drawings

The application is described with the aid of the following figures:

FIG. 1 is a schematic flowchart of a method for emotion analysis based on domain information according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a process for generating a word vector matrix according to an embodiment of the present application;

FIG. 3 is a diagram illustrating a structure of a text convolutional neural network model in an embodiment of the present application;

FIG. 4 is a schematic diagram of an emotion classification model training process in an embodiment of the present application;

FIG. 5 is a schematic structural diagram of an emotion analyzing apparatus based on domain information according to another embodiment of the present application;

fig. 6 is a schematic structural diagram of an electronic device according to another embodiment of the present application.

Detailed Description

For the purpose of better explaining the present invention and to facilitate understanding, the present invention will be described in detail by way of specific embodiments with reference to the accompanying drawings. It is to be understood that the following specific examples are illustrative of the invention only and are not to be construed as limiting the invention. In addition, it should be noted that, in the case of no conflict, the embodiments and features in the embodiments in the present application may be combined with each other; for convenience of description, only portions related to the invention are shown in the drawings.

Because the emotion semantics of the text are influenced by the field information and the emotions expressed by the text in different fields are different, the method integrates the field information to carry out emotion analysis on the text, and provides an emotion analysis method based on the field information. The present invention is described in detail below by way of examples.

Example one

Fig. 1 is a schematic flowchart of a method for emotion analysis based on domain information in an embodiment of the present application, where the present embodiment is applicable to a case of classifying text data, and the method may be executed by an emotion analysis device, and the device may be implemented in a form of software and/or hardware, as shown in fig. 1, and the method includes:

s10, preprocessing the text information to be analyzed, and acquiring the field information of the preprocessed text information;

s20, inputting the text information to be analyzed into an emotion classification model fused with domain information in advance, and acquiring the emotion type of the text information; the emotion classification model fused with domain information comprises: the method comprises the following steps of (1) a global emotion semantic model, a local emotion semantic model of each field and an emotion fusion strategy corresponding to each field;

acquiring a local emotion probability value of text information to be analyzed based on a local emotion semantic model matched with the field information;

acquiring a global emotion probability value of text information to be analyzed based on the global emotion semantic model;

The global emotion semantic model is trained based on full data, global emotion semantic features can be well extracted, the local emotion semantic model considers the emotion semantics of a text based on a field angle, predicted emotion probabilities of the global emotion semantic model and the local emotion semantic model are fused, and the robustness and accuracy of the model can be enhanced under the condition that the field semantics are considered.

According to the method, the emotion semantics of the field and the global emotion semantics are fused, so that the accuracy of the emotion analysis algorithm is greatly improved, and the classification effect is better.

For ease of understanding, the following description will be made for each step of the present embodiment.

In step S10 of this embodiment, the text information to be analyzed may be text data in different fields obtained from a social platform, an online shopping platform, a web portal, and the like, for example: fast digestion, public health, finance, education and the like; the obtaining mode can adopt any implementable mode to obtain the text to be analyzed, for example, the text to be analyzed can be directly obtained from the outside, and the text to be analyzed can also be called through an interface.

Step S10 of this embodiment specifically includes the following steps:

s11, preprocessing the text information to be analyzed to generate a word vector matrix;

and S12, inputting the word vector matrix into the trained field text classification model to obtain field information.

Fig. 2 is a schematic diagram of a word vector matrix generation flow in an embodiment of the present application, and as shown in fig. 2, step S11 includes steps S111 to S114, and steps S111 to S114 are explained below.

And S111, removing irregular information in the text information.

And converting the traditional text into a simplified text, and removing special characters in the text, such as @ username, url and the like.

And S112, performing word segmentation processing on the text processed in the step S111.

And performing word segmentation on the text by adopting a word segmentation tool, and removing stop words, specifically, performing word segmentation on the Chinese text by adopting a Jieba Chinese word segmentation technology. There are three word segmentation modes for Jieba word segmentation, which are an accurate mode, a full mode and a search engine mode, and the accurate mode is adopted in the embodiment.

In this embodiment, in addition to Jieba, SnowNLP, THULAC, NLPIR, etc. may be used to perform word segmentation on the original text, which may be set according to actual requirements, and this embodiment is not specifically limited.

And S113, searching index information of each word in the data after word segmentation processing based on a pre-established dictionary.

The process of creating the dictionary is as follows: firstly, performing word segmentation and part-of-speech tagging on training data, and only keeping words with parts-of-speech being like 'verb' and 'adjective', because the words can reflect the emotion of the text better, for example, adjectives like 'fear' and 'happy' can distinguish the emotion expressed by the text better than nouns; then counting the occurrence frequency of each word, reserving words with the length larger than 1, and finally endowing each word with a unique id as an index. Dictionary part word index, as shown in table 1:

TABLE 1

Word and phrase	id
		Consumption of	46
Business	47
		Reproduction of disease	48
Complaints	49
		Death was caused by death	50
Fraud	51

And S114, acquiring a word vector matrix with the length consistent with the preset sample length for each sentence.

Since the length of each sentence is not unique, and the length of each sample of the model input must be consistent, the length of each sentence needs to be set to a fixed value, which is denoted as max _ length. If max _ length is 5, the vector created by searching the index information of each word in the participle-processed data in S113 is (1,3,2,5), the vector is filled with 0 at the top, and may be represented as (0,1,3,2,5), and if max _ length is 3, the part exceeding the fixed value is cut off, and is represented as (1,3,2), so that each sentence is mapped into a vector with equal length.

In step S12 of this embodiment, the domain Text classification model is a Network model constructed and trained based on a Text Convolutional Neural Network (TextCNN).

The embedding layer of the TextCNN model obtains a vector space representation of the word by loading a pre-trained word vector. Because each input row vector represents a word, and the word is used as the minimum granularity of the text in the process of extracting the features, the width of the convolution kernel is consistent with the dimension of the word vector, and the height of the convolution kernel is the same as the CNN and can be set by the user.

Because the input is a sentence, the relevance between adjacent words in the sentence is high, and therefore convolution kernels with different heights are used in the convolution layer for convolution, and not only word senses but also word orders and contexts thereof are considered.

And obtaining semantic representation corresponding to the text through a TextCNN model, and finally performing domain classification through softmax.

It should be noted that, in this embodiment, the domain text classification model is trained in advance, but in some other embodiments, before step S10, the method may further include constructing and training the domain text classification model, and the steps include:

and constructing a TextCNN model. The TextCNN is composed of an embedding layer, a convolution-pooling layer, a dropout layer and an output layer.

In the embodiment, the width of a convolution kernel used in the TextCNN is consistent with the dimension of a word vector in a word vector matrix, convolution kernels of different sizes are adopted in the TextCNN to extract the semantic features of n-gram phrases of the matrix, and n is 2,3 or 4; 2,3 and 4 are general convolution sizes for general text processing, and the pooling layer in the TextCNN pools each feature vector after convolutional layer processing into a value by adopting maximum pooling operation and average pooling operation.

Because convolution kernels with different heights are used in the convolutional layer, so that the vector dimensions obtained after the convolutional layer are inconsistent, the embodiment pools each feature vector into a value by using the maximum pooling operation and the average pooling operation in the pooling layer respectively, that is, extracts the maximum value and the average value of each feature vector to represent the feature, the maximum value can represent the most important feature, and the average value can represent the global feature.

And acquiring text data sets of all fields as a training sample set, wherein sample labels of the training sample set belong to the fields of the samples.

And generating a word vector matrix based on the training samples, and specifically, converting the text into the word vector matrix according to a pre-established dictionary.

And (4) taking the word vector matrix as input, and training the text convolution neural network model to obtain a trained network model. Fig. 3 is a schematic structural diagram of a text convolutional neural network model in an embodiment of the present application, as shown in fig. 3, in a training process, TextCNN first uses a word vector obtained after preprocessing as an input, and maps each word into a word vector with a preset length of Embedding _ size at an Embedding (Embedding) layer, so that each word is represented as a matrix with a shape of max _ length _ Embedding _ size, where max _ length is a preset sample length; then, respectively extracting n-gram phrase semantic features by using convolution kernels with different sizes, wherein n is 2,3 and 4, namely extracting binary, ternary and quaternary phrase semantic features, and then respectively performing maximum pooling operation and average pooling; and finally, splicing the features extracted by the convolution kernels with different sizes together to serve as n-gram phrase semantic features, sending the n-gram phrase semantic features into a softmax layer for domain classification, and outputting domain information. And after training is finished, obtaining a field text classification model.

In this embodiment, step S20 includes:

and matching and acquiring a local emotion semantic model and an emotion fusion strategy of a corresponding field from a pre-constructed emotion classification model integrated with field information according to the field information, wherein the pre-constructed emotion classification model integrated with the field information comprises a global emotion semantic model, a local emotion semantic model of each field and an emotion fusion strategy corresponding to each field.

In the embodiment, the text information to be analyzed is input into an emotion classification model, and the local emotion probability value of the text information to be analyzed is acquired based on the local emotion semantic model matched with the field information; the local sentiment probability value may include a local semantic negative sentiment probability value and/or a local semantic positive sentiment probability value. Acquiring a global emotion probability value of text information to be analyzed based on the global emotion semantic model; the global sentiment probability value may include a global semantic negative sentiment probability value and/or a global semantic positive sentiment probability value. Fusing the local emotion probability value and the global emotion probability value based on an emotion fusion strategy matched with the field information to obtain a fusion result; and acquiring the emotion type of the text information based on the fusion result to obtain the emotion type of the text information to be analyzed.

In this embodiment, the emotion fusion strategy is to perform weighted fusion on the global emotion probability value and the local emotion probability value to obtain a fusion result. Performing weighted fusion on the global emotion probability value and the local emotion probability value by adopting preset weights to obtain a final emotion probability value; the final emotion probability value includes a negative emotion probability value and/or a positive emotion probability value.

Determining the emotion category of the text information to be analyzed based on the weighted fusion result, wherein the emotion category comprises the following steps:

when the negative emotion probability value is larger than 0.5, judging that the semantic emotion of the text information to be analyzed is negative, otherwise, judging that the semantic emotion of the text information to be analyzed is positive; or

And when the positive emotion probability value is more than 0.5, judging that the semantic emotion of the text information to be analyzed is positive, otherwise, judging that the semantic emotion of the text information to be analyzed is negative.

For example, when the local emotion probability value is a local semantic negative emotion probability value and the global emotion probability value is a global semantic negative emotion probability value, the global semantic negative emotion probability value and the local semantic negative emotion probability value are weighted and fused by using a preset weight by using a formula (1) to obtain a negative emotion probability value:

P₁＝w_global*p_global+w_part*p_part (1)

wherein, P₁As negative emotion probability value, p_globalFor global semantic negative sentiment probability values, p_partNegative emotion probability values for local semantics, w_globalWeight of negative emotion probability value for global semantics, w_partWeights for emotion probability values are negatively weighted for local semantics.

It should be noted that the weight can be adjusted according to specific situations.

In this embodiment, before step S10, the method further includes:

and S01, constructing an emotion classification model fused with domain information.

Specifically, a global emotion semantic model in an emotion classification model fused with domain information is constructed, and the global emotion semantic model is constructed based on a bidirectional conversion coding model and used for obtaining a global emotion probability value;

constructing a local emotion semantic model of each field in an emotion classification model fused with field information, wherein the local emotion semantic model is constructed based on a soft interval support vector machine algorithm and is used for obtaining a local emotion probability value;

and constructing an emotion fusion strategy of each field in the emotion classification model fused with the field information, wherein the emotion strategy is implemented by performing weighted fusion on the local emotion probability value and the global emotion probability value.

The Bidirectional transform coding model (BERT) adopts a pre-training joint fine-tuning architecture, can deeply interpret the meaning of a statement, shows the characteristics of rapidness, effectiveness and the like in a fine-tuning stage, further enhances the generalization of the model, and gradually develops into the strongest and latest novel model in the field of natural language processing.

Because the Support Vector Machine (SVM) algorithm is a novel small sample learning method with a solid theoretical basis, the SVM algorithm divides positive and negative samples by finding a maximum interval hyperplane, and converts the problem into a convex optimization problem to solve. The domain emotion classification is a two-classification problem, and because the data sample size of each domain is relatively small, the local emotion semantic model adopts a soft interval support vector machine algorithm to classify the domain emotion.

The structural loss of the soft interval support vector machine algorithm can improve the generalization performance of the model and the accuracy of the model.

S02, training an emotion classification model with domain information fused, comprising the following steps:

s021, training local emotion semantic models by taking the domain text data sets of the target domains as training sets respectively aiming at different target domains.

In the embodiment, a soft interval support vector machine model is constructed for each field, and each model constructs a local emotion semantic model by capturing tf-idf features. The following describes the training procedure of the local emotion semantic model.

Firstly, a field text data set of a target field is obtained, and a training sample is obtained after the data set is preprocessed. According to the following steps of 8: a scale of 2 divides the training samples into a training set and a test set.

Then, the tf-idf value of each word in the domain text dataset is calculated, generating a tf-idf vector of the training samples.

Specifically, first, the tf-idf value of a participle in a text is calculated by formula (2):

tfidf_i，w＝idf_w·tf_i，w (2)

wherein tfidf_i，wTf-idf value, idf, representing a word w in a text i_wRepresenting the inverse document frequency, tf, of the participle w_i，wRepresenting the word frequency of the word w in the text i.

Calculating the tf-idf value of each participle of the article to obtain the tf-idf vector of the text i and marking the vector as tfidf_iIn particular, it may be tfidf_i＝{tfidf_i，1，tfidf_i，2,., each coordinate in the tf-idf vector representing the text i is the tf-idf value of the first participle, the tf-idf value of the second participle, the tf-idf value of the third participle, and so on in the text i.

And finally, modeling the tf-idf vector by using a soft interval support vector machine algorithm to obtain a local emotion semantic model.

And during training, the effectiveness of the model is verified by using five-fold cross validation, and hierarchical sampling is required during cross validation due to the problem that positive and negative samples are unbalanced in data. The soft interval support vector machine algorithm training flow is as follows:

inputting: training data set T { (x)₁,y₁),(x₂,y₂),...,(x_N,y_N)}，y_kThe term "0, 1" denotes the emotion of the text expression, i.e. the label value, where 0 denotes negative, 1 denotes positive, K denotes a total of K texts, and T is the input data of the model, i.e. the text and the label y to be predicted_k。

And (3) outputting: and the emotion category corresponding to the text.

Evaluation: the trained model is applied to the test data set, and an F1 score on the test set is calculated to verify the generalization ability of the model.

And S012, training the global emotion semantic model by using the text data sets of all target fields as a training set.

For the text classification task, the BERT model inserts a [ CLS ] symbol in front of a text, and uses an output vector corresponding to the symbol as a semantic representation of the whole text for text classification, which can be understood as: compared with other characters/words existing in the text, the symbol without obvious semantic information can fuse the semantic information of each character/word in the text more "fairly", and can be used for better representing the semantics of sentences.

And acquiring text data sets of all fields as training samples, wherein sample labels of the training samples are emotion types.

And training the bidirectional conversion coding model based on the training samples to obtain a global emotion classification model.

S013, weighting and fusing the trained local emotion semantic model and the global emotion semantic model respectively according to different target fields, determining the global emotion semantic model weight and the local emotion semantic model weight under different target fields through model verification, and obtaining an emotion classification model weighted and fused by the final weight.

FIG. 4 is a schematic diagram of an emotion classification model training process in an embodiment of the present application; as shown in fig. 4, data preprocessing and chinese word segmentation are performed first, and then according to 8: 2, dividing input data into a training set and a test set, verifying the effectiveness of the model by using five-fold cross validation during training, equally dividing the training set into 5 parts firstly by using the five-fold cross validation, selecting one part of data as a verification set for each fold, and repeating the four parts of data as the training set for five times, mainly aiming at ensuring the robustness of the model. Due to the fact that the data has the problem of unbalance of positive and negative samples, hierarchical sampling is needed when cross validation is conducted. The emotion classification model process fusing the domain information is as follows:

inputting: training data set S { (x)₁,y₁),(x₂,y₂),...,(x_N,y_N)}，y_kThe text representation includes {0,1} representing emotion of text expression, i.e., a tag value, where 0 represents negative, 1 represents positive, k 1,2, M represents M pieces of text in total, and S is input data of a model, i.e., the text and a tag y to be predicted_k. Table 2 is an example table of training samples.

TABLE 2

Text x	Emotion tag y
		In recent years, deep learning of a fire!	1
The patient can be afraid of the disease without being ill	0
		Today is too happy!	1

And (3) outputting: and the emotion category corresponding to the text.

The model is of a pipeline structure, and the local emotion semantic model weight and the global emotion semantic model weight of each field need to be adjusted according to the accuracy of actual data, so that the optimal model combination is used for splicing in the use process of the model.

And finally, verifying the accuracy and generalization capability of the whole model on the test set. Specifically, a test set text is input, the text calculates global emotion probability through a bert model, meanwhile, the text judges field types through a field classification model TextCNN, local emotion probability is output according to the corresponding field emotion models, then appropriate weights are selected according to the field types to carry out weighting on the two output probabilities to output the final type, finally, the predicted emotion type is compared with a real emotion analogy, and an F1 value is calculated. Inputting: testing the data set; and (3) outputting: the accuracy of the test set.

And the weight obtained through the model verification is used as the preset weight when the emotion classification model is used for emotion classification.

According to the method, a local emotion semantic model is trained for each field, then a BERT classifier is trained by using full data to serve as a global emotion semantic model, the weight between the local emotion semantic model and the BERT classifier is obtained in a model training mode, the results of the local emotion semantic model and the BERT classifier are fused in a weighting mode, and finally an emotion classification model is verified, so that the purpose of accurately identifying text expression emotion is achieved.

The embodiment provides an emotion analysis method based on domain information, aiming at the problem that text expression emotion semantics are unstable in different domains, a global emotion semantic model and a local emotion semantic model are introduced, so that the robustness of an emotion analysis algorithm in different domains can be improved; the global emotion semantic model and the local emotion semantic model are subjected to weighted fusion, and the emotion semantics of the global emotion semantic model and the local emotion semantic model are fused, so that the accuracy of an emotion analysis algorithm can be improved.

Furthermore, the provided model is applied to public opinion monitoring, so that the government can be helped to master real social public opinion conditions, and prevention and control propaganda and public opinion guiding work can be scientifically and efficiently made.

Example two

A second aspect of the present application provides an emotion analyzing apparatus based on domain information, fig. 5 is a schematic structural diagram of an emotion analyzing apparatus based on domain information in another embodiment of the present application, as shown in fig. 5, the apparatus includes:

the domain information acquiring module 10 is configured to preprocess the text information to be analyzed and acquire domain information of the preprocessed text information;

the emotion classification module 20 is used for inputting the text information to be analyzed into an emotion classification model fused with domain information in advance to acquire the emotion type of the text information; the emotion classification model fused with domain information comprises: the method comprises the following steps of (1) a global emotion semantic model, a local emotion semantic model of each field and an emotion fusion strategy corresponding to each field;

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and related descriptions of the above-described apparatus may refer to the corresponding process in the foregoing method embodiments, and are not described herein again.

EXAMPLE III

A third aspect of the present application provides, by way of a third embodiment, an electronic apparatus, including: the system comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the computer program is executed by the processor to realize the steps of the emotion analysis method based on the domain information as described in any one of the above embodiments.

The electronic device shown in fig. 6 may include: at least one processor 101, at least one memory 102, at least one network interface 104, and other user interfaces 103. The various components in the electronic device are coupled together by a bus system 105. It is understood that the bus system 105 is used to enable communications among the components. The bus system 105 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 105 in fig. 6.

The user interface 103 may include, among other things, a display, a keyboard, or a pointing device (e.g., a mouse, trackball, or touch pad, among others.

It will be appreciated that the memory 102 in this embodiment may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile memory may be a Read-only memory (ROM), a programmable Read-only memory (PROM), an erasable programmable Read-only memory (erasabprom, EPROM), an electrically erasable programmable Read-only memory (EEPROM), or a flash memory. The volatile memory may be a Random Access Memory (RAM) which functions as an external cache. By way of example, but not limitation, many forms of RAM are available, such as static random access memory (staticiram, SRAM), dynamic random access memory (dynamic RAM, DRAM), synchronous dynamic random access memory (syncronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (DDRSDRAM ), Enhanced Synchronous DRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), and direct memory bus RAM (DRRAM). The memory 62 described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

In some embodiments, memory 102 stores elements, executable units or data structures, or a subset thereof, or an expanded set thereof as follows: an operating system 1021 and application programs 1022.

The operating system 1021 includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, and is used for implementing various basic services and processing hardware-based tasks. The application 622 includes various applications, such as an industrial control device operation management system, for implementing various application services. Programs that implement methods in accordance with embodiments of the invention can be included in application 1022.

In the embodiment of the present invention, the processor 101 is configured to execute the method steps provided in the first aspect by calling a program or an instruction stored in the memory 102, which may be specifically a program or an instruction stored in the application 1022.

The method disclosed by the above embodiment of the present invention can be applied to the processor 101, or implemented by the processor 101. The processor 101 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 101. The processor 101 described above may be a general purpose processor, a digital signal processor, an application specific integrated circuit, an off-the-shelf programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software elements in the decoding processor. The software elements may be located in ram, flash, rom, prom, or eprom, registers, among other storage media that are well known in the art. The storage medium is located in the memory 102, and the processor 101 reads the information in the memory 102 and completes the steps of the method in combination with the hardware thereof.

In addition, in combination with the emotion analysis method based on domain information in the above embodiments, an embodiment of the present invention may provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the method for emotion analysis based on domain information as in any one of the above embodiments is implemented.

It should be noted that in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. Furthermore, it should be noted that in the description of the present specification, the description of the term "one embodiment", "some embodiments", "examples", "specific examples" or "some examples", etc., means that a specific feature, structure, material or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, the claims should be construed to include preferred embodiments and all changes and modifications that fall within the scope of the invention.

It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention should also include such modifications and variations.

Claims

1. A sentiment analysis method based on domain information is characterized by comprising the following steps:

2. The method according to claim 1, wherein preprocessing the text information to be analyzed comprises:

3. The method of claim 2, wherein obtaining domain information of the preprocessed textual information comprises:

4. The method of claim 3, wherein before inputting the preprocessed text information into the trained domain text classification model to obtain the domain information, the method further comprises:

5. The method of claim 4, wherein inputting the preprocessed textual information into a trained domain text classification model to obtain the domain information comprises:

6. The method according to claim 1, wherein before inputting the text information to be analyzed into the emotion classification model into which domain information is pre-fused, the method further comprises:

constructing an emotion classification model fused with domain information;

7. The method of claim 6, wherein the fused result comprises a negative emotion probability value and/or a positive emotion probability value;

8. The method according to claim 6, wherein before inputting the text information to be analyzed into the emotion classification model pre-fused with domain information, the method further comprises:

9. The method of claim 7, wherein training the local emotion semantic model with a domain text data set of a target domain as a training set comprises:

acquiring a field text data set of a target field as a training sample;

10. An emotion analyzing apparatus based on domain information, characterized in that the apparatus comprises:

11. An electronic device, comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program when executed by the processor implementing the steps of the domain information based sentiment analysis method of any one of the preceding claims 1 to 9.

12. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, realizes the steps of the domain-information-based sentiment analysis method according to any one of the above claims 1 to 9.