CN113590814A

CN113590814A - Text classification method fusing text interpretation features

Info

Publication number: CN113590814A
Application number: CN202110521823.9A
Authority: CN
Inventors: 骆祥峰; 陈璐; 陈雪; 高剑奇
Original assignee: University of Shanghai for Science and Technology
Current assignee: University of Shanghai for Science and Technology
Priority date: 2021-05-13
Filing date: 2021-05-13
Publication date: 2021-11-02

Abstract

The invention relates to a text classification method fusing text interpretation characteristics. The method comprises the following specific implementation steps: (1) training a text classification model based on a neural network for predicting the category of the sentence; (2) acquiring the interpretation characteristics of the sentence prediction result in the step (1) by using a linear fitting method based on local random disturbance sampling; (3) selecting key interpretation characteristics beneficial to classification effect according to the frequency and weight of the acquired interpretation characteristics; (4) and (4) fusing the key interpretation features acquired in the step (3) with the raw data, and retraining the text classification model. The method uses a linear fitting method based on local random disturbance sampling to explain which key features have the largest contribution to the prediction result of the text classification model, fuses the features and the original labeled sample, and highlights the key features of the original sample, thereby improving the classification effect.

Description

Text classification method fusing text interpretation features

Technical Field

The invention relates to the technical field of natural language processing, in particular to a text classification method fusing text interpretation characteristics, which is a method for interpreting a trained text classification model based on a neural network by using a linear fitting method based on random disturbance sampling to obtain the interpretation characteristics of a prediction result of each sentence, and retraining the text classification model by fusing key interpretation characteristics in the text classification model, and can be applied to the specific fields of junk mail identification, text theme classification, emotion analysis and the like.

Background

Text classification is an important research direction in the field of natural language processing, and the task of text classification is to map a text to a predefined category by using a certain method. The text classification method includes a rule-based method and a machine learning-based method.

When the method based on the rules is used for text classification, different rules need to be set for different texts, so that time and labor are wasted, and the coverage and accuracy cannot be guaranteed. With the rise of machine learning, the machine learning method is used for the text classification task and achieves better effect. However, many machine learning models are black box models, and we can only obtain the prediction result given by the model, but cannot know why the model gives the result, and can only judge the reliability of the model from some judgment indexes such as the accuracy of the model, but in the fields of medical treatment and the like, we can provide a more accurate decision basis for model users by knowing not only the prediction result and the accuracy of the model but also the basis of the prediction result given by the model, and intervene in the model training process according to the basis of the prediction result given by the model, so that the model classification effect is improved.

In summary, due to the unexplainable property of the deep learning model, it is difficult for the model user to determine the basis of the prediction result given by the model, and to make a correct decision according to the prediction result of the model

Disclosure of Invention

The invention mainly aims to overcome the defects of the prior art and provide a text classification method fusing text interpretation characteristics, which uses a linear fitting method based on random disturbance sampling to interpret the prediction result of a text classification model based on a neural network, obtains the interpretation characteristics of each sentence according to the classification characteristics used in the linear fitting process, obtains key interpretation characteristics according to the frequency and weight of the characteristics, fuses the key interpretation characteristics with original data, and retrains the text classification model, thereby enabling the text classification result to be more accurate.

In order to achieve the purpose, the invention adopts the following technical scheme:

a text classification method fusing text interpretation features comprises the following operation steps:

step 1, training a text classification model based on a neural network to predict the category of a sentence;

step 2, obtaining the interpretation characteristics of the sentence prediction result in the step 1 by using a linear fitting method based on local random disturbance sampling;

step 3, selecting key interpretation characteristics which are beneficial to the classification effect according to the frequency and the weight of the interpretation characteristics acquired in the step 2;

and 4, fusing the key interpretation characteristics and the raw data acquired in the step 3, and retraining the text classification model.

Preferably, the training of the neural network-based text classification model in step 1 is used for predicting the category to which the sentence belongs, and the specific steps include:

(1-1) input layer: the input to the text classification model is a sentence with a category label, S ═ S (S)¹，S²，S³......，S^N) Wherein S isⁱRepresenting the ith sentence in the data set, N representing the number of sentences,

representing the jth word in the ith sentence, and k representing the number of words in the ith sentence;

(1-2) sentence vectorization: using Glove training word vectors, vocabulary V is set to (w)₁，w₂，w_3，......，w_M) Each word in the dictionary is converted into a 64-dimensional vector, and a vectorized word list V' is generated₁，v₂，v₃，......，v_M) The dimension of V' is

Wherein w_iRepresenting words in a vocabulary, v_iRepresents the word w_iM represents the number of all words present in the data set; the lookup word table V' converts words in the sentence into corresponding vector representation, and the sentence SⁱIs shown as

(1-3) Linear layer: vectorized sentences

Inputting a category label of a linear layer prediction sentence, wherein the linear layer formula is as follows:

wherein, y_lFor prediction results, is an array containing num _ class numbers, num _ class representing a predefined number of classes, where each number represents a likelihood size of predicting the class represented by the current location, l represents a linear transformation equation, W^TAnd b are parameters of the linear layer, respectively;

(1-4) softmax layer: predicting the result y by using softmax function_lThe value range of each value is mapped to [0, 1 ]]The formula of the softmax function is as follows:

wherein the content of the first and second substances,

indicates the prediction result y_lJ value of (1), y_lAfter each value in (1) is transformed by the softmax function, the sum of the num _ class values is 1;

(1-5) Loss equation: the final output of the model is the class label y corresponding to the maximum value in the prediction result^preBy usingFormula loss (y)ⁱ，y^pre)＝-y^prelog(softmax(yⁱ) Determine a loss function, where loss (y)ⁱ，y^pre) Represents the loss function, yⁱFor inputting a sentence SⁱThe label of (1);

(1-6) parameter optimization: and optimizing parameters of the text classification model by taking the minimized loss function as a target to obtain the trained text classification model.

Preferably, in the step 2, the linear fitting method based on local random disturbance sampling is used for obtaining the interpretation characteristics of the sentence prediction result in the step 1; the method comprises the following specific steps:

(2-1) selection of sentence S to be interpretedⁱAnd is in SⁱNearby samples are taken by random perturbations: sⁱFor sentences containing k words in the original data set

For sentence SⁱPerforming random disturbance, acquiring sampling samples, generating a data set containing a plurality of sampling samples, and performing vectorization representation on the sampling samples by using 0 and 1; the random perturbation process is as follows:

deleting sentence S at randomⁱThe number of deleted words is more than 0 and less than k, and a new sentence is obtained

Namely SⁱA randomly perturbed sample of (1), wherein

As a sentence SⁱThe jth word in the tth random disturbance sample, wherein c is the number of words remaining after random disturbance; initializing a 1 × k vector, setting the position of the deleted word to 0, and setting the other positions to 1 to obtain

Vectorized representation of

Each element therein

4999 times of random disturbance to obtain a new data set containing 5000 sentences

Wherein

Is the original sentence Sⁱ，SⁱIs expressed as a vector containing k 1 s; the vector matrix of the new data set X is represented as

(2-2) tagging the newly generated data:

inputting each data in the data set X into a trained text classification model for prediction to obtain a corresponding prediction result; expressing the trained text classification model as f, and obtaining the prediction result of each sample after the steps (1-1) to (1-4)

Is an array containing num _ class numbers, where each value represents the probability of prediction as a corresponding class;

(2-3) calculating the distance between all the disturbance data and the original data in the new data set Z as the disturbance data weight:

the closer the distance between the newly generated disturbance data and the original data is, the more the prediction data can be explained, the higher weight is given, the weight of the newly generated data is defined by using an exponential kernel function, and the calculation formula is as follows:

wherein the content of the first and second substances,

is an exponential kernel defined at cosine distance, representing the distance weight between samples, the closer the distance,

the larger the value of (a), σ is the kernel width;

(2-4) fitting the new data set Z using a linear model: the linear model is expressed in g, and the linear model formula is as follows:

wherein the content of the first and second substances,

as a vector in the data set Z, w_gIs the weight coefficient of the linear model;

(2-5) determining coefficients of the linear model: training a linear classification model to determine a weight coefficient, and setting a Loss equation as follows:

let L (f, g, π)_z) Minimum, obtain the optimal linear model weight w_g，w_gHas the dimension of

Wherein

For the t-th perturbation data, the data is,

is composed of

The vector form of (1);

(2-6) acquiring interpretation characteristics and denoising: after the linear model training is completed, Feaⁱ＝w_g×SⁱI.e. interpretation features and weights for different classes,

sorting the characteristics of the mth category from big to small according to the absolute value of the weight, removing the information such as auxiliary words, connecting words, punctuation marks and the like, and selecting the first T categories as sentences SⁱPredicted as an interpretation feature of the m-th class

Wherein the content of the first and second substances,

representing a set of features and each feature correspondence weight obtained by a model interpretation method for predicting an ith sentence into an mth category, m being labels corresponding to different categories, 1. ltoreq. m.ltoreq.num _ class,

is a sentence SⁱThe (c) th feature of (a),

is characterized in that

A corresponding weight; the feature representation model with the weight being positive considers that the feature supports the ith sample to be classified into the mth category, and we call this category of feature as positive feature or positive feature, and the feature representation model with the weight being negative considers that the feature does not support the ith sample to be classified into the mth category, which is called negative feature or negative feature.

Preferably, in the step 3, the selecting a key feature set according to the frequency and weight of the acquired interpretation feature includes:

(3-1) acquiring data SⁱAll the explanatory features:

represents the sentence S obtained by the step (3-6)ⁱPredicting a set of features corresponding to any category:

(3-2) calculating the frequency and weight of each feature: since the same feature may appear in different categories, the same feature is in

May occur multiple times in, will

The weights of all the same forward features are summed, and the first c1 features are sorted from large to small according to the weights

Calculated in the same way

The weights of all the middle negative features are ranked from large to small according to the absolute value of the weights, and the top c2 features are taken to obtain

Simultaneous calculation

The frequency of each negative feature in the sequence is sorted from high to low, and the first c3 features are taken

(3-3) obtaining the sentence SⁱKey explanatory features of (1): finally, sentence S is obtainedⁱKey interpretation feature set of

Is the intersection of the three sets obtained in step (3-2), and contains p key interpretation features:

preferably, in the step 4, the key interpretation features and the raw data obtained in the step 3 are fused to retrain the text classification model, and the specific steps include:

(4-1) acquiring data fusing key interpretation features: sentence S to be acquiredⁱKey interpretation feature and sentence S ofⁱThe sentences which are used as the input of the text classification model and are fused with the key interpretation characteristics are expressed as Sⁱ′：

Wherein the content of the first and second substances,

as a sentence SⁱThe number of k words in (a) is,

for the obtained sentence SⁱP key interpretation features of (a);

(4-2) retraining the text classification model: fusing key interpretation characteristics according to the steps from (2-1) to (4-1) on all training samples and test samples to obtain a new data set S ═ (S ═¹′，S²′，S³′，...，S^N') and then retraining the text classification model on the data set S' according to the process of claim 2, the resulting text classification results are more accurate.

Compared with the prior art, the invention has the following obvious and prominent substantive characteristics and remarkable technical progress:

1. the method uses a linear fitting method based on local random disturbance sampling to explain which key features have the greatest contribution to the prediction result of the text classification model, fuses the features and the original labeled sample, and highlights the key features of the original sample, thereby improving the classification effect;

2. the method can efficiently retrain the text classification model, so that the text classification result is more accurate.

Drawings

FIG. 1 is a flow chart of a text classification method for fusing text interpretation features according to the present invention.

FIG. 2 is a diagram of a neural network-based text classification model according to the present invention.

FIG. 3 is a flow chart of the present invention for obtaining interpretation characteristics using a model interpretation method.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings and tables.

The invention aims to provide a text classification method fusing text interpretation features, which is used for acquiring key features of a prediction result given by a text classification model through a model interpretation method, and using the key features and an original text together as an input retraining model of the text classification model, thereby improving the effect of the text classification model.

The invention provides a text classification method fusing text interpretation characteristics, which is characterized in that a linear fitting method based on local random disturbance sampling is used for interpreting a prediction result of a text classification model based on a neural network to obtain interpretation characteristics, key interpretation characteristics are obtained according to the frequency and weight of the characteristics and fused with original data, and the text classification model is retrained, so that the text classification result is more accurate. The basic features of the present invention mainly include the following aspects:

firstly, interpreting a prediction result of a trained text classification model by using a linear fitting method based on local random disturbance sampling to obtain an interpretation characteristic;

selecting key interpretation features which are beneficial to text classification according to the weight and the frequency of the interpretation features;

and thirdly, fusing the original data with key interpretation characteristics to retrain the text classification model.

The first embodiment is as follows:

referring to fig. 1, a text classification method fusing text interpretation features includes the following operation steps:

The method can efficiently retrain the text classification model, so that the text classification result is more accurate.

Example two:

in the above embodiment, referring to fig. 1, the text classification method with fused text interpretation features is shown in flowchart,

a text classification method fusing text interpretation features comprises the following steps of:

step S1: training a text classification model based on a neural network for predicting the category of the sentence, wherein the text classification model is illustrated in the attached figure 2, and the model parameter setting is illustrated in the table 1; the specific process is as follows:

(1-1) input layer: acquiring an AG-News data set, wherein the AG-News is a standard English data set for text classification, and comprises 127600 pieces of data in four categories; considering the time problem of training a text classification model and acquiring each data interpretation feature, uniformly and randomly sampling data of each category from an AG-News data set, and selecting 16000 pieces of data for an experiment, wherein a training set comprises 12800 pieces of data, and a verification set and a test set respectively comprise 1600 pieces of data; the input to the text classification model is a sentence with a category label, S ═ S (S)¹，S²，S³......，S^N) In which S isⁱRepresents the ith sentence in the data set, N represents the number of sentences, the value is 16000,

j represents the jth word of the ith sentence, k represents the number of words in the ith sentence, and the value of k is not fixed due to different sentence lengths;

TABLE 1 text classification model parameter set

(1-2) sentence vectorization: using Glove training word vectors, vocabulary V is set to (w)₁，w₂，w₃，......，w_M) Each word in (a) is converted into a 64-dimensional vector representation, and a vectorized vocabulary V ═ (V) is generated₁，v₂，v₃，......，v_M) Wherein w is_iRepresenting words in a vocabulary, v_iRepresents the word w_iVectorization ofMeaning that M represents the number of all words present in the dataset, M takes the value 161067, and V' has a dimension of

The lookup word table V' converts words in the sentence into corresponding vector representation, and the sentence SⁱIs shown as

(1-3) Linear layer: vectorized sentences

wherein the content of the first and second substances,_ylfor prediction results, an array of 4 values is formed, where each value represents the likelihood of predicting the class represented by the current location, l represents a linear transformation equation, W^TAnd b are parameters of the linear layer respectively, and the value range of the random initialization parameter is (-0.3, 0.3).

(1-4) softmax layer: predicting the result y by using softmax function_lThe value range of each value in (b) is mapped to [0, 1 ]]The formula of the sofimax function is as follows:

wherein the content of the first and second substances,

indicates the prediction result y_lJ value of (1), y_lAfter each value in (a) is transformed by the softmax function, the sum of the 4 values is 1.

(1-5) Loss equation: the final output of the model is the most predictive resultClass label y for large values^preUsing the formula loss (y)ⁱ，y^pre)＝-y^prelog(softmax(yⁱ) Determine a loss function, where loss (y)ⁱ，y^pre) Represents the loss function, yⁱLabels are tagged to the input sentence.

(1-6) parameter optimization: and optimizing the parameters of the text classification model by taking the minimization of the loss function as a target. As shown in table 1, Batch Size is set to 16, i.e., 16 sentences are input into the text classification model at a time. The learning rate in the model training process is 2.0, the learning rate adjustment multiple is 0.8, the adjustment interval is 1 epoch, namely, the learning rate is adjusted to be 0.8 times of the previous epoch every time an epoch passes, and the model is finally iterated for 35 times to complete the training.

Step S2: and acquiring the interpretation characteristics of the sentence prediction result in the step S1 by using a linear fitting method based on local random disturbance sampling. The specific process is shown in the attached figure 3:

(2-1) selection of sentence S to be interpretedⁱAnd at SⁱNearby samples are taken by random perturbations: sⁱFor sentences containing k words in the original data set

For sentence SⁱPerforming random perturbation, obtaining a sampling sample, generating a data set containing a plurality of sampling samples, and performing vectorization representation on the sampling samples by using 0 and 1. The random perturbation process is as follows:

Namely SⁱA randomly perturbed sample of (1), wherein

As a sentence SⁱAnd d, randomly disturbing the jth word in the sample of the t-th time, wherein c is the number of words remaining after random disturbance. Initializing a 1 xk vector, deleting bits of the wordSet to 0 and the other positions to 1 to obtain

Vectorized representation of

Each element therein

Wherein

Is the original sentence Sⁱ，SⁱIs represented as a vector containing k 1 s. The vector matrix of the new data set X is represented as

(2-2) tagging the newly generated data: and inputting each data in the data set X into the trained text classification model for prediction to obtain a corresponding prediction result. Expressing the trained text classification model as f, and obtaining the prediction result of each sample after the steps (1-1) to (1-4)

For an array containing 4 numbers, 4 is the number of data classes, each of which represents the probability of predicting as a corresponding class.

(2-3) calculating the distance between all the disturbance data and the original data in the new data set Z as the disturbance data weight: the closer the distance between the newly generated disturbance data and the original data is, the more the prediction data can be explained, the higher weight is given, the weight of the newly generated data is defined by using an exponential kernel function, and the calculation formula is as follows:

wherein the content of the first and second substances,

the larger the value of (a), σ is the kernel width.

wherein the content of the first and second substances,

as a vector in the data set Z, w_gIs the weight coefficient of the linear model.

(2-5) determining coefficients of the linear model: the Loss equation is set as follows:

Wherein

For the t-th perturbation data, the data is,

is composed of

In the form of a vector.

(2-6) acquiring interpretation characteristics and denoising: after the linear model training is completed, Feaⁱ＝w_gX is the interpretation characteristic and weight for different classes,

sorting the features of the mth category from large to small according to the absolute value of the weight, removing the information such as auxiliary words, connecting words, punctuation marks and the like, and selecting the first T as the explanation features of the mth category predicted by the sentence x

Wherein the content of the first and second substances,

the ith sentence output by the representation model interpretation method is predicted to be the set of the features of the mth category and the weight corresponding to each feature, m is a label corresponding to different categories, m is more than or equal to 1 and less than or equal to 4,

is a sentence SⁱThe (c) th feature of (a),

is characterized in that

The corresponding weight. The feature with the weight of positive value indicates that the model considers the feature to support the ith sample to be classified into the mth category, and we call this category of feature as positive feature or positive feature, and the feature with the weight of negative value indicates that the model considers the feature not to support the ith sample to be classified into the mth category, which is called negative feature or negative feature.

Step S3: selecting key interpretation characteristics beneficial to classification effect according to the frequency and weight of the acquired interpretation characteristics, and specifically performing the following steps:

(3-1) acquiring data SⁱAll the explanatory features:

represents the sentence S obtained by the step (2-6)ⁱPredicting a set of features corresponding to any category:

(3-2) calculating the frequency and weight of each feature:

since the same feature may appear in different categories, it is possible to use the same feature in different categories

May occur multiple times, will

The weights of all the same forward features in the sequence are summed, and the first c1 features are taken according to the order of the weights from large to small

Calculated in the same way

The weights of all the negative features are ranked from large to small according to the absolute value of the weights, and the top c2 features are taken to obtain

Simultaneous calculation

The frequency of each negative-going feature in the sequence is sorted from high to low, and the first c3 features are taken

Is the intersection of the three sets obtained in step (4-2), and contains p key interpretation features:

step S4: fusing the key interpretation features and the raw data acquired in the step S3, and retraining the text classification model, which specifically comprises:

Wherein the content of the first and second substances,

as a sentence SⁱThe number of k words in (a) is,

for the obtained sentence SⁱP key interpretation features.

(4-2) retraining the text classification model: fusing key interpretation characteristics according to the steps from (2-1) to (4-1) on all training samples and test samples to obtain a new data set S ═ (S ═¹′，S²′，S³′，...，S^N') and then retraining the text classification model on the data set S according to the process of claim 2, the resulting text classification results are more accurate.

Description of the experiment and results: the experimental data set is a part of the data in the AG-News data set in the step (1-1), 16000 pieces of data are obtained for experiment by randomly and uniformly sampling the data of each category, wherein the training set comprises 12800 pieces of data, and the verification set and the test set respectively comprise 1600 pieces of data. Table 2 shows experimental comparison results of training a text classification model using data fused with key interpretation features and training the text classification model using raw data. Wherein Train _ acc is the accuracy of the training set, Test _ acc is the accuracy of the Test set, Test _ ma _ R is the macro recall rate of the Test set, Test _ ma _ f1 is the macro f1 value of the Test set, and Test _ mi _ f1 is the micro f1 value of the Test set. It can be seen that the method provided by the invention is improved in all indexes, wherein the accuracy of the test set is improved by 2.39 percentage points, which shows that the method provided by the invention can improve the effect of the text classification model.

TABLE 2 Experimental results

The method uses a linear fitting method based on local random disturbance sampling to explain which key features have the greatest contribution to the prediction result of the text classification model, fuses the features and the original labeled sample, and highlights the key features of the original sample, thereby improving the classification effect; the method can efficiently retrain the text classification model, so that the text classification result is more accurate.

The foregoing is a more detailed description of the present invention in connection with specific/preferred embodiments thereof, and it is not intended that the practice of the invention be limited to these descriptions. It will be apparent to those skilled in the art that various substitutions and modifications can be made to the described embodiments without departing from the spirit of the invention, and these substitutions and modifications should be considered to fall within the scope of the invention.

Claims

1. A text classification method fusing text interpretation features is characterized by comprising the following operation steps:

2. The method for classifying the text fusing the text interpretation features according to claim 1, wherein the training of the neural network-based text classification model in the step 1 is used for predicting the category to which the sentence belongs, and the specific steps include:

(1-1) input layer: the input to the text classification model is a sentence with a category label, S ═ S (S)¹，S²，S³......，S^N) In which S isⁱRepresenting the ith sentence in the data set, N representing the number of sentences,

w_j ⁱrepresenting the jth word in the ith sentence, and k representing the number of words in the ith sentence;

(1-2) sentence vectorization: using Glove training word vectors, vocabulary V is set to (w)₁，w₂，w₃，......，w_M) Each word in the dictionary is converted into a 64-dimensional vector, and a vectorized word list V' is generated₁，v₂，v₃，......，v_M) The dimension of V' is

(1-3) Linear layer: vectorized sentences

wherein, y_lFor prediction results, is an array containing num _ class numbers, num _ class representing a predefined number of classes, where each number represents a likelihood size of predicting the class represented by the current location, l represents a linear transformation equation, W^TAnd b are parameters of the sexual layer respectively;

(1-4) softmax layer: predicting the result y by using softmax function_lEach inThe value ranges of the values are all mapped to [0, 1 ]]The formula of the softmax function is as follows:

wherein the content of the first and second substances,

indicates the prediction result y_lOf_lAfter each value in (1) is transformed by the softmax function, the sum of the num _ class values is 1;

(1-5) Loss equation: the final output of the model is the class label y corresponding to the maximum value in the prediction result^preUsing the formula loss (y)ⁱ，y^pre)＝-y^prelog(softmax(yⁱ) Determine a loss function, where loss (y)ⁱ，y^pre) Represents the loss function, yⁱFor inputting a sentence SⁱThe label of (1);

3. The method for classifying texts fusing text interpretation features according to claim 1, wherein in the step 2, the linear fitting method based on local random disturbance sampling is used to obtain the interpretation features of the sentence prediction results in the step 1; the method comprises the following specific steps:

Namely SⁱA randomly perturbed sample of (1), wherein

Vectorized representation of

Each element therein

Wherein

(2-2) tagging the newly generated data:

inputting each data in the data set X into a trained text classification model for prediction to obtain a corresponding prediction result; will trainThe trained text classification model is represented as f, and the prediction result of each sample is obtained after the steps (1-1) to (1-4)

Is an array containing num _ class numbers, each of which represents a probability of prediction as a corresponding class;

wherein the content of the first and second substances,

the larger the value of (a), σ is the kernel width;

wherein the content of the first and second substances,

Wherein

For the t-th perturbation data, the data is,

is composed of

The vector form of (1);

sorting the features of the mth category from big to small according to the absolute value of the weight, removing the information such as auxiliary words, connecting words, punctuation marks and the like, and selecting the first T categories as sentences SⁱPredicted as an interpretation feature of the m-th class

Wherein the content of the first and second substances,

representing the set of features and corresponding weights of each feature, which are obtained by a model interpretation method and predict the ith sentence into the mth category, wherein m is a label corresponding to different categories, m is more than or equal to 1 and less than or equal to num _ class, f_j ⁱIs a sentence SⁱThe (c) th feature of (a),

is a characteristic f_j ⁱA corresponding weight; the feature with the weight of positive value indicates that the model considers that the feature supports the ith sample to be classified into the mth category, and we call this category of feature as positive feature or positive feature, and the feature with the weight of negative value indicates that the model considers that the feature does not support the ith sample to be classified into the mth category, which is called negative feature or negative feature.

4. The method for classifying texts fusing text interpretation features according to claim 1, wherein in the step 3, the key feature set is selected according to the frequency and weight of the obtained interpretation features, and the specific steps include:

(3-1) acquiring data SⁱAll the explanatory features:

May occur multiple times in, will

Calculated in the same way

Simultaneous calculation

。

5. the method for classifying texts fusing text interpretation features according to claim 1, wherein in the step 4, the key interpretation features obtained in the step 3 are fused with the raw data, and the text classification model is retrained, and the specific steps include:

(4-1) acquiring data fusing key interpretation features: sentence S to be acquiredⁱKey interpretation feature and sentence S ofⁱThe sentences which are used together as the input of the text classification model and are fused with key interpretation characteristics are represented as S^i′：

Wherein the content of the first and second substances,

as a sentence SⁱThe number of k words in (a) is,

for the obtained sentence SⁱP key interpretation features of (a);

(4-2) retraining the text classification model: fusing key interpretation characteristics according to the steps from (2-1) to (4-1) on all training samples and test samples to obtain a new data set S ═ (S ═^1′，S^2′，S^3′，...，S^N′) Then according to the rightThe process of claim 2 is used to retrain the text classification model on the data set S', and the obtained text classification result is more accurate.