CN113672731B

CN113672731B - Emotion analysis method, device, equipment and storage medium based on field information

Info

Publication number: CN113672731B
Application number: CN202110881327.4A
Authority: CN
Inventors: 张佳旭; 王宇琪; 郝保; 曹家; 刘莹; 鲁县华; 罗引; 王磊
Original assignee: Beijing Zhongke Wenge Technology Co ltd
Current assignee: Beijing Zhongke Wenge Technology Co ltd
Priority date: 2021-08-02
Filing date: 2021-08-02
Publication date: 2024-02-23
Anticipated expiration: 2041-08-02
Also published as: CN113672731A

Abstract

The embodiment of the disclosure relates to an emotion analysis method based on domain information, which comprises the following steps: preprocessing text information to be analyzed, and acquiring field information; inputting text information to be analyzed into an emotion classification model which is pre-fused with field information, and obtaining emotion types of the text information; the emotion classification model integrated with the domain information comprises the following steps: the method comprises the steps of a global emotion semantic model, a local emotion semantic model of each field and an emotion fusion strategy corresponding to each field; the method comprises the steps of obtaining a local emotion probability value based on a local emotion semantic model matched with field information; acquiring a global emotion probability value based on the global emotion semantic model; based on an emotion fusion strategy matched with the field information, fusing the local emotion probability value and the global emotion probability value to obtain a fusion result; and acquiring emotion categories based on the fusion result. The text emotion analysis is carried out by the method, so that the classification effect is better, and the accuracy of the analysis result is greatly improved.

Description

Emotion analysis method, device, equipment and storage medium based on field information

Technical Field

The application belongs to the technical field of natural language processing, and particularly relates to an emotion analysis method, device and equipment based on field information and a storage medium.

Background

With the development of the internet and the popularity of social networking and online shopping, users leave a great deal of text data on various network platforms. Wherein a significant portion of the text has a subjective tendency to express the user's emotion for a particular entity, event, or itself. The emotion analysis can automatically mine and analyze the emotion states in a mass of texts, and is widely applied to the fields of public opinion analysis, advertisement delivery or conversation robot design and the like.

One of the existing emotion analysis methods is to extract semantic information from a text and perform emotion classification by a machine learning method, wherein a typical method is to extract tf-idf features from the text and then identify emotion classification by using a machine learning classifier. However, the words in the sentence are not piled up in terms, different syntaxes bring completely different emotion expressions, and some simple statistical features have not ideal emotion classification effect on the text. And the other is to adopt a deep learning method to carry out emotion analysis through a convolutional neural network or a recurrent neural network.

According to the method, aiming at the problem that the emotion judgment errors can occur in texts in different contexts, for example, the emotion corresponding to the text containing the big fire is judged to be negative emotion, but the big fire corresponds to the negative emotion in the public safety field, and corresponds to the positive emotion in the quick-elimination field. Therefore, the text emotion analysis result obtained by the existing text emotion analysis method is easy to have the problem of emotion judgment errors, and has poor classification effect and low accuracy.

Disclosure of Invention

First, the technical problem to be solved

In view of the foregoing drawbacks and disadvantages of the prior art, the present application provides a method, apparatus, device, and storage medium for emotion analysis based on domain information.

(II) technical scheme

In order to achieve the above purpose, the present application adopts the following technical scheme:

in a first aspect, the present application provides an emotion analysis method based on domain information, the method including:

preprocessing text information to be analyzed, and acquiring domain information of the preprocessed text information;

inputting the text information to be analyzed into an emotion classification model which is pre-fused with field information, and obtaining emotion types of the text information; the emotion classification model integrated with the domain information comprises the following steps: the method comprises the steps of a global emotion semantic model, a local emotion semantic model of each field and an emotion fusion strategy corresponding to each field;

The method comprises the steps of obtaining a local emotion probability value of text information to be analyzed based on a local emotion semantic model matched with field information;

acquiring a global emotion probability value of the text information to be analyzed based on a global emotion semantic model;

based on an emotion fusion strategy matched with the field information, fusing the local emotion probability value and the global emotion probability value to obtain a fusion result;

and acquiring the emotion type of the text information based on the fusion result.

Optionally, the preprocessing the text information to be analyzed includes:

removing irregular information in the text information, and performing word segmentation;

searching index information of each word in the word segmentation processed data based on a pre-established dictionary, and acquiring a word vector matrix composed of the index information of the words for each sentence; the words in the pre-established dictionary consist of verbs and adjectives;

judging whether the dimension of the word vector matrix accords with a preset sample length max-length or not;

if the word vector matrix does not accord with the sample length processing rule, acquiring the word vector matrix consistent with the sample length max-length.

Optionally, acquiring the domain information of the preprocessed text information includes:

inputting the preprocessed text information into a trained domain text classification model to obtain the domain information;

the field text classification model is a classification model constructed and trained based on a text convolutional neural network textCNN.

Optionally, inputting the preprocessed text information into the trained domain text classification model, and before obtaining the domain information, further including:

and constructing a field text classification model by adopting a text convolutional neural network (TextCNN), wherein the width of a convolutional kernel used in the TextCNN is consistent with the dimension of a word vector in the word vector matrix, the heights of the convolutional kernel used in the TextCNN are respectively 2, 3 and 4, and a pooling layer in the TextCNN pools each feature vector processed by the convolutional layer into a value by adopting a maximum pooling operation and an average pooling operation.

Optionally, inputting the preprocessed text information into a trained domain text classification model to obtain the domain information, including:

mapping each word into a word vector with a preset length of Embedding-size at an Embedding layer to obtain a plurality of matrixes with the shape of max-length;

Extracting n-gram phrase semantic features of the matrix by adopting convolution kernels with different sizes, wherein n is 2, 3 and 4;

sequentially carrying out maximum pooling operation and average pooling on the n-gram phrase semantic features, and splicing the pooled numerical values to serve as n-gram semantic features after pooling;

and inputting the pooled n-gram semantic features into a softmax layer to obtain the domain information.

Optionally, before inputting the text information to be analyzed into the emotion classification model in which the domain information is fused in advance, the method further includes:

constructing an emotion classification model fused with field information;

constructing a global emotion semantic model in an emotion classification model fused with field information based on the bidirectional conversion coding model, and obtaining a global emotion probability value;

constructing a local emotion semantic model of each field in the emotion classification model fused with the field information based on a soft interval support vector machine algorithm, and obtaining a local emotion probability value;

and carrying out weighted fusion on the local emotion probability value and the global emotion probability value to obtain an emotion fusion strategy of each field in the emotion classification model fused with the field information.

Optionally, the fusion result includes a negative emotion probability value and/or a positive emotion probability value;

Based on the fusion result, acquiring the emotion type of the text information, which comprises the following steps:

when the negative emotion probability value is larger than 0.5, judging that the semantic emotion of the text information is negative, otherwise, judging that the semantic emotion is positive; or alternatively

And when the positive emotion probability value is larger than 0.5, judging that the semantic emotion of the text information is positive, and otherwise, judging that the semantic emotion is negative.

training the emotion classification model fused with the domain information, wherein the method comprises the following steps of:

respectively aiming at different target fields, and training the local emotion semantic model by taking a field text data set of the target field as a training set;

training the global emotion semantic model by taking a text data set of all target fields as a training set;

and respectively carrying out weighted fusion on the trained local emotion semantic model and the trained global emotion semantic model aiming at different target fields, and determining the global emotion semantic model weight and the local emotion semantic model weight under different target fields through model verification.

Optionally, training the local emotion semantic model by using the domain text dataset of the target domain as a training set includes:

Acquiring a field text data set of a target field as a training sample;

calculating tf-idf values of each word in the field text data set, and generating tf-idf vectors of training samples;

modeling the tf-idf vector by using a soft interval support vector machine algorithm to obtain a local emotion semantic model.

In a second aspect, the present application provides an emotion analysis device based on domain information, the device including:

the field information acquisition module is used for preprocessing the text information to be analyzed and acquiring the field information of the preprocessed text information;

the emotion classification module is used for inputting the text information to be analyzed into an emotion classification model which is fused with field information in advance, and acquiring emotion types of the text information; the emotion classification model integrated with the domain information comprises the following steps: the method comprises the steps of a global emotion semantic model, a local emotion semantic model of each field and an emotion fusion strategy corresponding to each field;

In a third aspect, the present application provides an electronic device, comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, which when executed by the processor performs the steps of the domain information based emotion analysis method as set forth in any of the above first aspects.

In a fourth aspect, the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the domain information based emotion analysis method as set forth in any of the above first aspects.

(III) beneficial effects

The technical scheme that this application provided can include following beneficial effect: firstly, preprocessing text information to be analyzed, and acquiring field information; and then inputting the text information to be analyzed into an emotion classification model which is pre-fused with the field information, and obtaining the emotion type of the text information. The emotion classification model integrated with the domain information comprises the following steps: global emotion semantic model, local emotion semantic model of each field and emotion fusion strategy corresponding to each field. The method fuses the emotion semantics of the field with the global emotion semantics, so that the accuracy of an emotion analysis algorithm is greatly improved, and the classification effect is better.

Drawings

The application is described with the aid of the following figures:

FIG. 1 is a schematic flow chart of an emotion analysis method based on domain information in an embodiment of the present application;

FIG. 2 is a schematic diagram of a word vector matrix generation flow in one embodiment of the present application;

FIG. 3 is a schematic diagram of a text convolutional neural network model in one embodiment of the present application;

FIG. 4 is a schematic diagram of an emotion classification model training process according to an embodiment of the present application;

FIG. 5 is a schematic diagram of an emotion analysis device based on domain information according to another embodiment of the present application;

fig. 6 is a schematic architecture diagram of an electronic device according to another embodiment of the present application.

Detailed Description

The invention will be better explained by the following detailed description of the embodiments with reference to the drawings. It is to be understood that the specific embodiments described below are merely illustrative of the related invention, and not restrictive of the invention. In addition, it should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other; for convenience of description, only parts related to the invention are shown in the drawings.

Since emotion semantics of texts are affected by field information, emotion expressed by texts in different fields is different, and therefore the application fuses the field information to perform emotion analysis on the texts, and an emotion analysis method based on the field information is provided. The present invention will be described in detail by examples.

Example 1

Fig. 1 is a schematic flow chart of an emotion analysis method based on domain information in an embodiment of the present application, where the embodiment is applicable to a case of classifying text data, the method may be performed by an emotion analysis device, and the device may be implemented in a form of software and/or hardware, as shown in fig. 1, and the method includes:

s10, preprocessing text information to be analyzed, and acquiring field information of the preprocessed text information;

s20, inputting text information to be analyzed into an emotion classification model which is pre-fused with field information, and obtaining emotion types of the text information; the emotion classification model integrated with the domain information comprises the following steps: the method comprises the steps of a global emotion semantic model, a local emotion semantic model of each field and an emotion fusion strategy corresponding to each field;

Based on the global emotion semantic model, acquiring a global emotion probability value of text information to be analyzed;

The global emotion semantic model is trained based on the full data, global emotion semantic features can be extracted well, the local emotion semantic model considers emotion semantics of texts based on the field angle, and the predicted emotion probabilities of the global emotion semantic model and the local emotion semantic model are fused, so that robustness and accuracy of the model can be enhanced under the condition of considering the field semantics.

According to the method, the emotion semantics of the field and the global emotion semantics are fused, so that the accuracy of an emotion analysis algorithm is greatly improved, and the classification effect is better.

For ease of understanding, the steps of the present embodiment are described below.

In step S10 of the present embodiment, the text information to be analyzed may be text data of different fields obtained from a social platform, an online shopping platform, a portal, and the like, for example: quick-acting, public health, financial accounting, education, etc.; the method for obtaining the text to be analyzed can adopt any implementation method, for example, the text to be analyzed can be directly obtained from the outside, and the text to be analyzed can be called through an interface.

Step S10 of the present embodiment specifically includes the following steps:

s11, preprocessing text information to be analyzed to generate a word vector matrix;

s12, inputting the word vector matrix into the trained field text classification model to obtain field information.

Fig. 2 is a schematic diagram of a word vector matrix generation flow chart in an embodiment of the present application, as shown in fig. 2, step S11 includes steps S111 to S114, and the following description will be given for steps S111 to S114.

S111, removing irregular information in the text information.

The complex text is converted to a simplified text, and special characters in the text, such as @ username, url, etc., are removed.

S112, word segmentation processing is carried out on the text processed in the step S111.

And (3) word segmentation is carried out on the text by adopting a word segmentation tool, stop words are removed, and particularly, word segmentation is carried out on the Chinese text by adopting a Jieba Chinese word segmentation technology. The Jieba word segmentation has three word segmentation modes, namely an accurate mode, a full mode and a search engine mode, and the embodiment adopts the accurate mode.

In this embodiment, besides Jieba, snowNLP, THULAC, NLPIR may be used to segment the original text, and may be set according to actual requirements, which is not specifically limited in this embodiment.

S113, searching index information of each word in the word segmentation processed data based on a pre-established dictionary.

The process of creating the dictionary is as follows: firstly, training data are subjected to word segmentation and part-of-speech tagging, and words with parts-of-speech of 'verb' and 'adjective' types are reserved only, because the words can reflect emotion of a text more, for example, adjectives such as 'fear', 'happiness' and the like can distinguish emotion expressed by the text more than nouns; and then counting the occurrence frequency of each word, reserving words with the length larger than 1, and finally assigning a unique id for each word as an index. Dictionary part word index as shown in table 1:

TABLE 1

Words and phrases	id
		Consumption of	46
Service	47
		Reproduction of	48
Complaints of	49
		Death of	50
Fraud	51

S114, acquiring a word vector matrix consistent with a preset sample length for each sentence.

Since the length of each sentence is not unique and the length of each sample of the model input must be uniform, the length of each sentence needs to be set to a fixed value, denoted as max_length. If max_length=5, the vector created by searching the index information of each word in the word-segmented data in S113 is (1,3,2,5), the vector is filled with 0 at the forefront, which may be represented as (0,1,3,2,5), and if max_length=3, the part exceeding the fixed value is truncated, which is represented as (1, 3, 2), so that each sentence is mapped into a vector with equal length.

In step S12 of this embodiment, the domain text classification model is a network model constructed and trained based on a text convolutional neural network (Text Convolutional Neural Network, textCNN).

The embedding layer of the TextCNN model obtains a vector space representation of the word by loading the pre-trained word vector. Since each line of vector input represents a word, the word is used as the minimum granularity of the text in the process of extracting the features, the width of the convolution kernel is consistent with the dimension of the word vector, and the height of the convolution kernel is the same as CNN and can be set by oneself.

Because the input is a sentence, the relevance between adjacent words in the sentence is high, convolution kernels with different heights are used in the convolution layer for convolution, and word meaning and word sequence and the context are considered.

And obtaining semantic representation corresponding to the text through a textCNN model, and finally carrying out field classification through softmax.

It should be noted that, in this embodiment, the domain text classification model is pre-trained, but in other embodiments, before step S10, the method may further include constructing and training a domain text classification model, where the steps include:

and constructing a textCNN model. TextCNN is composed of an embedded layer, a convolution-pooling layer, a dropout layer, and an output layer.

In the embodiment, the width of a convolution kernel used in the textCNN is consistent with the dimension of a word vector in a word vector matrix, and n is 2,3 and 4 by adopting convolution kernels with different sizes to extract the semantic features of n-gram phrase of the matrix; wherein 2,3 and 4 are general convolution sizes for general text processing, and the pooling layer in TextCNN uses a maximum pooling operation and an average pooling operation to pool each feature vector processed by the convolution layer into a value.

Because convolution kernels with different heights are used in the convolution layer, so that vector dimensions obtained after passing through the convolution layer are inconsistent, the embodiment respectively uses a maximum pooling operation and an average pooling operation in the pooling layer to pool each feature vector into a value, namely, extracting the maximum value and the average value of each feature vector to represent the feature, wherein the maximum value can represent the most important feature, and the average value can represent the global feature.

And acquiring a text data set in all fields as a training sample set, wherein a sample label of the training sample set is the field to which the sample belongs.

A word vector matrix is generated based on the training samples, and in particular, text is converted into the word vector matrix according to a pre-established dictionary.

And training the text convolutional neural network model by taking the word vector matrix as input to obtain a trained network model. FIG. 3 is a schematic diagram of a text convolutional neural network model according to an embodiment of the present application, as shown in FIG. 3, in the training process, the textCNN first uses word vectors obtained after preprocessing as input, maps each word into a word vector with a preset length of Embedding_size in an Embedding (Embedding) layer, and thus each sentence is expressed as a matrix with a shape of max_length, where max_length is a preset sample length; then, respectively extracting semantic features of n-gram phrases by using convolution kernels with different sizes, wherein n is 2,3 and 4, namely extracting semantic features of binary, ternary and quaternary phrases, and respectively carrying out maximum pooling operation and average pooling; finally, features extracted by convolution kernels with different sizes are spliced together to be used as n-gram phrase semantic features, and the n-gram phrase semantic features are sent to a softmax layer to carry out field classification, and field information is output. And after training is completed, obtaining a field text classification model.

In this embodiment, step S20 includes:

and matching and acquiring a local emotion semantic model and an emotion fusion strategy of the corresponding field from the pre-constructed emotion classification model fused with the field information according to the field information, wherein a pre-constructed emotion classification model library fused with the field information comprises a global emotion semantic model, the local emotion semantic model of each field and the emotion fusion strategy corresponding to each field.

In the embodiment, text information to be analyzed is input into an emotion classification model, and a local emotion probability value of the text information to be analyzed is obtained based on a local emotion semantic model matched with field information; the local emotion probability values may include local semantic negative emotion probability values and/or local semantic positive emotion probability values. Based on the global emotion semantic model, acquiring a global emotion probability value of text information to be analyzed; the global emotion probability values may include global semantic negative emotion probability values and/or global semantic positive emotion probability values. Based on an emotion fusion strategy matched with the field information, fusing the local emotion probability value and the global emotion probability value to obtain a fusion result; and acquiring the emotion type of the text information based on the fusion result to obtain the emotion type of the text information to be analyzed.

In this embodiment, the emotion fusion policy performs weighted fusion on the global emotion probability value and the local emotion probability value to obtain a fusion result. Weighting and fusing the global emotion probability value and the local emotion probability value by adopting preset weights to obtain a final emotion probability value; the final emotion probability values include negative emotion probability values and/or positive emotion probability values.

Determining emotion categories of text information to be analyzed based on the weighted fusion result, including:

when the negative emotion probability value is larger than 0.5, judging that the semantic emotion of the text information to be analyzed is negative, otherwise, judging that the semantic emotion is positive; or alternatively

And when the positive emotion probability value is larger than 0.5, judging that the semantic emotion of the text information to be analyzed is positive, otherwise, judging that the semantic emotion is negative.

For example, when the local emotion probability value is a local semantic negative emotion probability value and the global emotion probability value is a global semantic negative emotion probability value, the global semantic negative emotion probability value and the local semantic negative emotion probability value are weighted and fused by adopting a formula (1) to obtain a negative emotion probability value:

P ₁ ＝w _global *p _global +w _part *p _part (1)

wherein P is ₁ Is the negative emotion probability value, p _global For global semantic negative emotion probability value, p _part Is the local semantic negative emotion probability value, w _global Weights for global semantic negative emotion probability values, w _part Weights for local semantic negative emotion probability values.

It should be noted that the weight may be adjusted according to the specific situation.

In this embodiment, before step S10, the method further includes:

s01, constructing an emotion classification model fused with the domain information.

Specifically, a global emotion semantic model in an emotion classification model fused with field information is constructed, and the global emotion semantic model is constructed based on a bidirectional conversion coding model and is used for obtaining a global emotion probability value;

constructing a local emotion semantic model of each field in the emotion classification model fused with the field information, wherein the local emotion semantic model is constructed based on a soft interval support vector machine algorithm and is used for obtaining a local emotion probability value;

and constructing an emotion fusion strategy of each field in the emotion classification model fused with the field information, wherein the emotion strategy carries out weighted fusion on the local emotion probability value and the global emotion probability value.

The bi-directional conversion coding model (Bidirectional Encoder Representation from Transformers, BERT) adopts a pre-training combined fine-tuning framework, can deeply interpret statement connotations, has the characteristics of rapidness, effectiveness and the like in a fine-tuning stage, further enhances the generalization of the model, and gradually evolves into the strongest and most novel model in the field of natural language processing.

Since the support vector machine algorithm (Support Vector Machine, SVM) algorithm is a novel small sample learning method with a solid theoretical basis, the SVM algorithm divides positive and negative samples by finding the maximum interval hyperplane, thereby converting the problem into a convex optimization problem for solving. The domain emotion classification is a classification problem, and because the data sample size of each domain is relatively small, the local emotion semantic model adopts a soft interval support vector machine algorithm to classify the domain emotion.

The structural loss of the soft interval support vector machine algorithm can improve the generalization performance of the model and also can improve the accuracy of the model.

S02, training an emotion classification model fused with domain information, comprising the following steps of:

s021, training a local emotion semantic model by taking a field text data set of the target field as a training set aiming at different target fields.

In the embodiment, soft interval support vector machine models are built for each field, and each model constructs a local emotion semantic model by grabbing tf-idf characteristics. The following describes the training procedure of the local emotion semantic model.

Firstly, acquiring a field text data set of a target field, and preprocessing the data set to obtain a training sample. According to 8: the ratio of 2 divides the training samples into training and test sets.

Then, the tf-idf value of each word in the domain text dataset is calculated, generating tf-idf vectors for the training samples.

Specifically, firstly, calculating tf-idf values of the words in the text through a formula (2):

tfidf _i，w ＝idf _w ·tf _i，w (2)

wherein tfidf is _i，w Tf-idf value, idf, representing word segment w in text i _w Inverse document frequency, tf, representing word segmentation w _i，w The word frequency of the word segment w in the text i is indicated.

Calculating tf-idf value of each word for all the words of the article to obtain tf-idf vector of text i, which is marked as tfidf _i Specifically, it may be tfidf _i ＝{tfidf _i，1 ，tfidf _i，2 ,. the respective coordinates in the tf-idf vector representing text i are respectively the tf-idf value of the first word segment, the tf-idf value of the second word segment, the tf-idf value of the third word segment, etc. in the text i, and so on.

And finally, modeling the tf-idf vector by using a soft interval support vector machine algorithm to obtain a local emotion semantic model.

Five-fold cross-validation is used to validate the model during training, and hierarchical sampling is required during cross-validation due to the imbalance of positive and negative samples in the data. The soft interval support vector machine algorithm training flow is as follows:

input: training data set t= { (x) ₁ ,y ₁ ),(x ₂ ,y ₂ ),...,(x _N ,y _N )}，y _k = {0,1} represents emotion of text expression, i.e. tag value, where 0 represents negative, 1 represents positive, k=1, 2..k represents a total of K pieces of text, T is input data of the model, i.e. text and tag y to be predicted _k 。

And (3) outputting: emotion categories corresponding to the text.

Evaluation: the trained model is used on a test data set, and F1 scores on the test set are calculated to verify the generalization capability of the model.

S012, training a global emotion semantic model by taking a text data set of all target fields as a training set.

For the text classification task, the BERT model inserts a [ CLS ] symbol in front of the text, and uses the output vector corresponding to the symbol as the semantic representation of the whole text, which can be understood as: the sign without obvious semantic information can more 'fairly' merge the semantic information of each word/word in the text compared with other words/words in the text, and can be used for better representing the semantic meaning of sentences.

And acquiring a text data set in all fields as a training sample, wherein a sample label of the training sample is an emotion type.

And training the bidirectional substitution coding model based on the training sample to obtain a global emotion classification model.

S013, respectively carrying out weighted fusion on the trained local emotion semantic model and the trained global emotion semantic model aiming at different target fields, and determining the global emotion semantic model weight and the local emotion semantic model weight under different target fields through model verification to obtain an emotion classification model which carries out weighted fusion by using the final weight.

FIG. 4 is a schematic diagram of an emotion classification model training process according to an embodiment of the present application; as shown in fig. 4, data preprocessing and chinese word segmentation are first performed, and then according to 8:2 into training set and test set, five-fold cross validation is used to validate the model, wherein the five-fold cross validation firstly divides the training set into 5 parts, each fold selects one data as validation set, and the other four parts are used as training set, and the four parts are repeated five times, mainly to ensure the robustness of the model. Because of the imbalance of positive and negative samples in data, hierarchical sampling is required for cross-validation. The emotion classification model flow integrating the domain information is as follows:

input: training data set s= { (x) ₁ ,y ₁ ),(x ₂ ,y ₂ ),...,(x _N ,y _N )}，y _k = {0,1} represents emotion of text expression, i.e. tag value, where 0 represents negative, 1 represents positive,k=1, 2,..m, M, represents a total of M pieces of text, S being the input data of the model, i.e. the text and the label y to be predicted _k . Table 2 is an example table of training samples.

TABLE 2

Text x	Emotion label y
		In recent years, deep learning of strong fire-!	1
Against this symptom, there is no fear of illness	0
		Today too happy-!	1

And (3) outputting: emotion categories corresponding to the text.

The model is in a ppline structure, and the weight of the local emotion semantic model and the weight of the global emotion semantic model in each field are required to be adjusted according to the accuracy of actual data so as to splice by using an optimal model combination in the use process of the model.

And finally, verifying the accuracy and generalization capability of the whole model on the test set. The text is firstly subjected to global emotion probability calculation through a bert model, meanwhile, the text is subjected to field classification model textCNN judgment to output local emotion probability according to a corresponding field emotion model, then two output probabilities are weighted according to the field category selection proper weight to output the final category, and finally, the predicted emotion category is compared with the true emotion analogy to calculate an F1 value. Input: testing the data set; and (3) outputting: the accuracy of the test set.

The weight obtained through model verification is used as a preset weight when emotion classification is carried out by using an emotion classification model.

According to the embodiment, a local emotion semantic model is trained aiming at each field, then a full-scale data training BERT classifier is used as a global emotion semantic model, weights between the two models are obtained in a model training mode, results of the two models are fused in a weighting mode, and finally the emotion classification model is verified, so that the aim of accurately identifying text expression emotion is achieved.

The embodiment provides an emotion analysis method based on field information, which aims at the problem that text expression emotion semantics are unstable in different fields, introduces a global emotion semantic model and a local emotion semantic model, and can improve the robustness of an emotion analysis algorithm in different fields; the global emotion semantic model and the local emotion semantic model are subjected to weighted fusion, and the accuracy of an emotion analysis algorithm can be improved by fusing emotion semantics of the global emotion semantic model and the local emotion semantic model.

Furthermore, the proposed model is applied to public opinion monitoring, so that governments can be helped to master real social public opinion conditions, and prevention and control propaganda and public opinion guiding work can be scientifically and efficiently performed.

Example two

In a second aspect of the present application, an emotion analysis device based on domain information is provided, and fig. 5 is a schematic structural diagram of an emotion analysis device based on domain information in another embodiment of the present application, as shown in fig. 5, where the device includes:

the domain information acquisition module 10 is used for preprocessing the text information to be analyzed and acquiring the domain information of the preprocessed text information;

the emotion classification module 20 is used for inputting the text information to be analyzed into an emotion classification model which is pre-fused with the field information, and acquiring emotion types of the text information; the emotion classification model integrated with the domain information comprises the following steps: the method comprises the steps of a global emotion semantic model, a local emotion semantic model of each field and an emotion fusion strategy corresponding to each field;

It will be clear to those skilled in the art that, for convenience and brevity of description, the specific working process of the apparatus described above and the related description may refer to the corresponding process in the foregoing method embodiment, which is not repeated here.

Example III

A third aspect of the present application provides, by way of embodiment three, an electronic device, including: a memory, a processor, and a computer program stored on the memory and executable on the processor, the computer program when executed by the processor implementing the steps of the domain information based emotion analysis method as set forth in any of the above embodiments.

The electronic device shown in fig. 6 may include: at least one processor 101, at least one memory 102, at least one network interface 104, and other user interfaces 103. The various components in the electronic device are coupled together by a bus system 105. It is understood that the bus system 105 is used to enable connected communications between these components. The bus system 105 includes a power bus, a control bus, and a status signal bus in addition to a data bus. But for clarity of illustration the various buses are labeled as bus system 105 in fig. 6.

The user interface 103 may include, among other things, a display, a keyboard, or a pointing device (e.g., a mouse, a trackball (trackball), or a touch pad, etc.).

It will be appreciated that the memory 102 in this embodiment may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile memory may be a Read-only memory (ROM), a programmable Read-only memory (ProgrammableROM, PROM), an erasable programmable Read-only memory (ErasablePROM, EPROM), an electrically erasable programmable Read-only memory (ElectricallyEPROM, EEPROM), or a flash memory, among others. The volatile memory may be a random access memory (RandomAccessMemory, RAM) that acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic random access memory (DynamicRAM, DRAM), synchronous dynamic random access memory (SynchronousDRAM, SDRAM), double data rate synchronous dynamic random access memory (ddr SDRAM), enhanced Synchronous Dynamic Random Access Memory (ESDRAM), synchronous link dynamic random access memory (SynchlinkDRAM, SLDRAM), and direct memory bus random access memory (DirectRambusRAM, DRRAM). The memory 62 described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

In some implementations, the memory 102 stores the following elements, executable units or data structures, or a subset thereof, or an extended set thereof: an operating system 1021, and application programs 1022.

The operating system 1021 includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, for implementing various basic services and processing hardware-based tasks. Application 622 contains various applications, such as an industrial control device operation management system, for implementing various application services. A program for implementing the method of the embodiment of the present invention may be included in the application program 1022.

In an embodiment of the present invention, the processor 101 is configured to execute the method steps provided in the first aspect by calling a program or an instruction stored in the memory 102, specifically, a program or an instruction stored in the application 1022.

The method disclosed in the above embodiment of the present invention may be applied to the processor 101 or implemented by the processor 101. The processor 101 may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in the processor 101 or instructions in the form of software. The processor 101 described above may be a general purpose processor, a digital signal processor, an application specific integrated circuit, an off-the-shelf programmable gate array or other programmable logic device, a discrete gate or transistor logic device, a discrete hardware component. The disclosed methods, steps, and logic blocks in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software elements in a decoding processor. The software elements may be located in a random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 102, and the processor 101 reads information in the memory 102, and in combination with its hardware, performs the steps of the method described above.

In addition, in combination with the emotion analysis method based on domain information in the above embodiment, the embodiment of the present invention may provide a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements the emotion analysis method based on domain information in any one of the above method embodiments.

It should be noted that in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. Furthermore, it should be noted that in the description of the present specification, the terms "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., refer to a specific feature, structure, material, or characteristic described in connection with the embodiment or example being included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art upon learning the basic inventive concepts. Therefore, the appended claims should be construed to include preferred embodiments and all such variations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, the present invention should also include such modifications and variations provided that they come within the scope of the following claims and their equivalents.

Claims

1. An emotion analysis method based on domain information, which is characterized by comprising the following steps:

acquiring emotion types of the text information based on the fusion result;

the preprocessing of the text information to be analyzed comprises the following steps:

if not, acquiring a word vector matrix consistent with the sample length max-length according to the sample length processing rule;

acquiring domain information of the preprocessed text information, including:

the field text classification model is a classification model constructed and trained based on a text convolutional neural network (TextCNN);

inputting the preprocessed text information into a trained domain text classification model, and before obtaining the domain information, further comprising:

constructing a field text classification model by adopting a text convolutional neural network (TextCNN), wherein the width of a convolutional kernel used in the TextCNN is consistent with the dimension of a word vector in the word vector matrix, the heights of the convolutional kernel used in the TextCNN are respectively 2, 3 and 4, and a pooling layer in the TextCNN pools each feature vector processed by the convolutional layer into a value by adopting a maximum pooling operation and an average pooling operation;

inputting the preprocessed text information into a trained domain text classification model to obtain the domain information, wherein the method comprises the following steps:

inputting the pooled n-gram semantic features into a softmax layer to obtain field information;

before inputting the text information to be analyzed into the emotion classification model in which the domain information is fused in advance, the method further comprises:

constructing an emotion classification model fused with field information;

2. The method according to claim 1, wherein the fusion result comprises a negative emotion probability value and/or a positive emotion probability value;

3. The method according to claim 1, wherein before inputting the text information to be analyzed into the emotion classification model in which domain information is previously fused, the method further comprises:

4. The method of claim 2, wherein training the local emotion semantic model with a domain text dataset of a target domain as a training set comprises:

acquiring a field text data set of a target field as a training sample;

5. An emotion analysis device based on domain information, the device comprising:

acquiring emotion types of the text information based on the fusion result;

acquiring domain information of the preprocessed text information, including:

The field text classification model is a classification model constructed and trained based on a text convolutional neural network (TextCNN); inputting the preprocessed text information into a trained domain text classification model, and before obtaining the domain information, further comprising:

before inputting the text information to be analyzed into the emotion classification model with the field information fused in advance, the method further comprises the following steps:

constructing an emotion classification model fused with field information;

6. An electronic device, comprising: memory, a processor and a computer program stored on the memory and executable on the processor, which when executed by the processor carries out the steps of the domain information based emotion analysis method as claimed in any of the preceding claims 1 to 4.

7. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the domain information based emotion analysis method as set forth in any of the preceding claims 1 to 4.