CN109271522A

CN109271522A - Comment sensibility classification method and system based on depth mixed model transfer learning

Info

Publication number: CN109271522A
Application number: CN201811383793.4A
Authority: CN
Inventors: 代明军; 谢立
Original assignee: Shenzhen University
Current assignee: Shenzhen University
Priority date: 2018-11-20
Filing date: 2018-11-20
Publication date: 2019-01-25
Anticipated expiration: 2038-11-20
Also published as: CN109271522B

Abstract

The present invention provides comment sensibility classification method and system based on depth mixed model transfer learning, and the comment sensibility classification method acquires comment on commodity, and pre-process to the source domain set of data samples of comment on commodity the following steps are included: step S1；Pretreated data are mapped as term vector by step S2；Step S3 carries out depth mixed model pre-training to the source domain set of data samples of comment on commodity；Step S4 carries out the fine tuning of depth mixed model to the target domain set of data samples of comment on commodity；Step S5 carries out emotional semantic classification to the comment on commodity of target domain.Training speed of the present invention is fast and training difficulty is low, it is only necessary to which several wheel training can be obtained compared with high-class precision, and can also obtain preferable classifying quality when noise is more or the data set of negligible amounts is trained, and small to data set dependence, robustness is good；The present invention also effectively increases transportable ability, reaches the purpose of nicety of grading after improving transfer learning.

Description

Comment emotion classification method and system based on deep hybrid model transfer learning

Technical Field

The invention relates to a comment emotion classification method, in particular to a comment emotion classification method based on deep mixed model transfer learning, and a comment emotion classification system adopting the comment emotion classification method based on deep mixed model transfer learning.

Background

In the prior art, sentiment classification of commodity comments mainly includes the following two methods: the first method is a clustering-based cross-domain migration learning commodity comment emotion classification method, and the principle is as follows: the method is characterized in that words irrelevant between the source field and the target field are used as intermediaries, the similarity of the words of the source field and the target field is utilized, and the related words of the fields in different fields are arranged into a unified cluster through a spectral clustering algorithm, so that the difference between the related words of the field data set of the source field and the field data set of the target field is reduced. When the difference between related words in the fields is reduced, the classification effect of the classifier trained in the source field in the target field is improved. The method for changing the similarity of the words in the source domain and the target domain requires that the data characteristics of the source domain A have higher similarity with the data characteristics of the target domain B, and then a better migration effect can be achieved, such as clothes and quilts, mobile phones and tablets. Because the method is based on a migration strategy of a pure data layer, the method has high requirements on the respective natural similarity of the data characteristics of the source field and the target field, is not suitable for scenes (such as hotels and clothes) with large difference between the source field and the target field, has limited migratable capability and has poor migration effect.

The second comment emotion classification method based on multi-source field instance migration comprises the following principles: the characteristics of multiple source fields are combined by transferring samples from different source fields by utilizing multi-source learning, and the stability and effectiveness of transfer learning are improved. Firstly, more initial weights are distributed to target domain samples, and data are resampled in each step to improve the phenomenon of reference unbalance; a dynamic mechanism is added to improve a TrAdaBoost migration algorithm, the convergence speed of the weight of a source domain sample with low target domain correlation is reduced, the source domain sample and a target domain template are utilized to help the learning of a target domain task together, the knowledge with positive migration effect in all source domains is fully utilized according to the principle, and the classification of a model in a target domain is improved. The method for example migration based on the multi-source field is characterized in that the existing multi-source field data set mixed part target source fields are used for training together, and the iterative optimization method is used for increasing the weight of samples with positive migration effects in all source field data, so that the purpose of effectively utilizing all components with positive migration effects in the existing multi-source field samples for classification of the target fields is achieved, and the dependence on the target field data set is reduced. It can be seen that the classification effect of the TrAdaBoost algorithm depends in large part on the degree of original similarity between the source and target domain data, and the quantity and quality of the source domain data sets. Meanwhile, the training difficulty of the method is higher, and particularly, if the noise ratio of a sample (particularly target domain data) in the data is higher when the initial training is started, the iteration frequency is not controlled well, so that the precision of the classifier is greatly reduced. From the practical application, the robustness of the migration method is poor, and the accuracy is generally low.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a comment emotion classification method based on deep mixed model transfer learning, which can reduce the dependence of a model on a data set, improve the migratable capability of the data set, and further improve the classification precision after transfer learning, and further provide a comment emotion classification system adopting the comment emotion classification method based on deep mixed model transfer learning.

In contrast, the invention provides a comment emotion classification method based on deep hybrid model transfer learning, which comprises the following steps:

step S1, collecting commodity comments, and preprocessing a source field data sample set of the commodity comments;

step S2, mapping the preprocessed data into word vectors;

step S3, carrying out deep mixed model pre-training on the source field data sample set of the commodity comment;

step S4, fine adjustment of a deep mixing model is carried out on the target field data sample set of the commodity comment;

step S5, carrying out sentiment classification on the commodity comments in the target field;

wherein the step S3 includes the following substeps:

step S301, performing convolution operation on the word vector data processed in the source field obtained in step S2;

step S302, the node output after convolution operation is input to a gate cycle unit network;

step S303, inputting the node output of the gate cycle unit network to a weighted transformation matrix of an attention mechanism to obtain a preset dimension vector, and converting the preset dimension vector into a two-dimensional vector;

step S304, setting the value in the activated two-dimensional vector to exceed 1 of a preset threshold value and setting the value to be less than 0 of the preset threshold value through an activation function, and obtaining a prediction label of a corresponding sample

Step S305, calculating an emotion polarity label y and a prediction label by using a cross entropy cost functionA loss value L in between;

and S306, performing back propagation by using a batch gradient descent algorithm and realizing repeated iteration to realize the pre-training of the deep hybrid model.

The invention is further improved in that in the step S304, the function is activatedComputing a value σ (Z) for computing a two-dimensional vector as an output layer activation function_jWherein Z is_jJ is the element index of the two-dimensional vector Z, and K is 2.

The invention is further improved in that in the step S305, the cross entropy cost function is usedComputing emotion polarity label y and prediction labelWith a loss value L in between.

A further refinement of the invention is that said step S4 comprises the following sub-steps:

step S401, repeating the step S1 and the step S2 on a preset number of target field labeled data sets P, and adding a pooling layer after a deep mixing model;

step S402, pre-training the weight W obtained in step S3^SLeading into a deep mixing model;

step S403, the preprocessed target domain data P obtained in the step S3_SFine tuning is performed.

In a further improvement of the present invention, in the step S403, the weighting parameters of the gate cycle unit network are appliedFreezing, performing reverse iterative update for j times, and adding the weight parameter after iterative update of the convolutional layer, the weight parameter after iterative update of the attention model, the weight parameter after iterative update of the gate cycle unit network and the weight parameter after iterative update of the full connection layer to obtain all weight parameters W of the depth mixed model after fine adjustment in the target field^YTo achieve fine tuning.

A further refinement of the invention is that the step S403 comprises the following substeps:

step S4031, input target area data P_SCarrying out convolution operation;

step S4032, passing the output of the convolutional layer through the pooling layer;

step S4033, the node output of the pooling layer is input to the gate cycle unit network;

step S4034, the output of the gate cycle unit network is sent to a weighted transformation matrix of the attention mechanism to obtain a preset dimension vector, and the preset dimension vector is changed into a two-dimensional vector;

step S4035, the activated two-dimensional vector median value exceeds a preset threshold value through an activation functionIs less than the preset threshold value, is set to 0 to obtain the prediction label of the corresponding sample

Step S4036, emotion polarity label y and prediction label are calculated by using cross entropy cost functionA loss value L in between;

and step S4037, reversely propagating and realizing repeated iteration by using a batch gradient descent algorithm according to the reduction loss value L as an optimization direction so as to realize the pre-training of the deep hybrid model.

In a further improvement of the present invention, in the back propagation process of step S4037, the weight parameters for the gate cycle cell network are determinedFreezing is carried out.

In a further improvement of the present invention, in the step S3, during the pre-training, the source domain data sample set is input to the convolutional layer and then directly input to the gate cycle unit network, bypassing the pooling layer.

The invention also provides a comment emotion classification system based on the deep mixed model transfer learning, which adopts the comment emotion classification method based on the deep mixed model transfer learning and sequentially comprises a convolution layer, a pooling layer, a gate cycle unit network, an attention mechanism model, a random discarding layer, a full connection layer and an output layer.

Compared with the prior art, the invention has the beneficial effects that: capturing bottom layer features by utilizing the strong feature recognition capability of a convolution structure, and reweighing the output after passing through a gate cycle unit network by combining with a weight transformation matrix of an attention mechanism model so as to strengthen the capability of the model for recognizing keywords in comments; secondly, aiming at the problem that the convolution model is insensitive to the text word order structure, the learning capability and the context combining capability of the model to the text word order structure are enhanced by combining a gate cycle unit network, and the classification effect of longer commodity comment texts after transfer learning is improved; and a small probability dropout layer is accessed to the full connection layer, and a small part of nodes do not work (output is set to zero) randomly during training, so that overfitting is prevented, and the generalization capability of the model is enhanced.

The method has the advantages of high training speed and low training difficulty, can obtain higher classification precision only by several rounds of training, can obtain better classification effect when a data set with more noise or less quantity is trained, and has small dependence on the data set and good robustness; on the basis, the invention also effectively improves the migratable capability of the model, and achieves the purposes of improving the classification precision after the migration learning and the like.

Drawings

FIG. 1 is a schematic workflow diagram of one embodiment of the present invention;

FIG. 2 is a general flow chart of one embodiment of the present invention;

FIG. 3 is a schematic flow chart diagram illustrating a transfer learning method according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a system model architecture according to an embodiment of the present invention.

Detailed Description

Preferred embodiments of the present invention will be described in further detail below with reference to the accompanying drawings.

The example applies to platforms that require analysis of a large number of different types of merchandise reviews, such as e-commerce platforms. The application mode is as follows: the classification effect of the emotion classification model based on supervised learning depends on the quantity and quality of the labeled data sets, and enough corresponding labeled data sets are required to be provided in different commodity fields for learning and fitting. The method aims to enhance the generalization capability of the model and reduce the dependence of the model on a data set by using a migration learning strategy combined with a deep hybrid model. The defects of limited migratable capability, strong dependence on a data set, poor migration learning effect, difficult training and the like in the prior art can be effectively improved.

In contrast, as shown in fig. 1 to 3, this example provides a comment emotion classification method based on deep hybrid model transfer learning, including the following steps:

step S2, mapping the preprocessed data into word vectors;

step S5, the commodity comments in the target field are classified into sentiments.

In step S1, the collected labeled source domain data sample set T is subjected to word segmentation processing by using an object-oriented word segmentation tool, and the symbols and predefined stop words in the source domain data sample set T are removed by using a round function, so as to obtain the preprocessed text data. The source domain is also called the source domain and the target domain is also called the target domain. The stop words can be preset and self-defined and adjusted according to the requirements of users.

Specifically, in step S1, a python participle tool (i.e., an object-oriented participle tool) is preferably used to participle the collected sample T of the tagged source domain data set, and the cyclic function of the python tool is used to remove the symbol and the specified stop word, so as to obtain the preprocessed data T_pre。

Step S2 in this example includes the following substeps:

step S201, obtaining word vectors through the preprocessed text data obtained in step S1;

step S202, setting the mapping dimension of the word vector and the maximum length of the input commodity comment to obtain a distributed expression of the training sample in the source field;

step S203, obtaining input data through the emotion polarity labels corresponding to the commodity comment samples.

In step S201 in this example, the preprocessed text data T obtained in step S1 is used_preConverting words into word vectors T by constructing a Chinese word vector model_s(ii) a In the step S202, the mapping dimension of the word vector is set to be l, and the maximum length h of the commodity comment is input to obtain T_S’＝[T_s1,T_s2,...,T_sh]^T，T_S’∈R^h×l，T_S’Is a distributed expression of the training sample in the source domain; in step S203, y represents an emotion polarity label corresponding to each product comment sample, and y is set to [1, 0 ═]Indicating that the sample of the product review has positive polarity, setting y to [0,1 ═ y]The negative polarity of the commodity comment sample is expressed, and input data [ T ] of the deep mixing model is obtained_S’,y]。

More specifically, in this example, step S2 is to compare the text data T obtained in step S1_preThe word vector word2vec (T) is obtained by the python tool Genism4 (a tool for converting words into vectors based on the word2vec model)_pre)＝T_s(ii) a Then setting l as the mapping dimension of the word vector, and setting the input length as h (which can be understood as the maximum length of the sentence of the commodity comment input each time, and the maximum length h can be self-defined and adjusted according to the actual requirement) to obtain T_S＝[T_s1,T_s2,...,T_sh]^T，T_S∈R^h×cFor a distributed representation of training samples in the source domain, T_S∈R^h×cIndicates T_sSize specification of data; r represents all matrices with h x c specification, anddenote the emotion polarity label corresponding to each sample by y (y ═ 1, 0)]Or y ═ 0,1]The former indicates the sample band positive polarity and the latter indicates the sample band negative polarity), the data of the input model is obtained: [ T ]_S,y]. c is the mapping dimension of the word vector.

Step S3 in this example includes the following substeps:

step S304, setting the value in the activated two-dimensional vector to exceed 1 of a preset threshold value and setting the value to be less than 0 of the preset threshold value through an activation function, and obtaining a prediction label of a corresponding sampleThe preset threshold is a preset two-dimensional vector numerical value judgment value and can be adjusted and set according to actual needs.

step S306, back propagation is carried out, repeated iteration is realized, and pre-training of the deep hybrid model is realized; preferably, a batch gradient descent algorithm is used for back propagation and repeated iteration is realized so as to realize the pre-training of the deep hybrid model.

Wherein, in the step S305, the cross entropy cost function is usedComputing emotion polarity label y and prediction labelWith a loss value L in between.

More specifically, step S3 in this example is a pre-training step, which includes the following steps: step S301, when the source field sample data [ T ]_S,Y]After the model is input from the input layer, a convolution module with a convolution kernel size of a × b × c is input for convolution. Where a, b and c are all parameters of the size of the convolution kernel, which can be understood as the height, length and width of the cube, note that here the size parameters w and T of the convolution kernel_S∈R^h×cThe data volume of the mapping dimension c of the word vector in (1) is the same, which means that the width c of the convolution kernel is the same as the value of the mapping dimension c of the word vector, and technically, c and b are generally called as the length and width of the convolution kernel, and a is the number of the convolution kernels (or called as the number of channels of the convolution kernel).

In this example, step S302 outputs T to the convolved nodes_CInputting into a GRU network (gate cycle unit network) with dimension d; in step S303, the node of the GRU network is output as T_G(node output of gate cycle unit network) obtaining node output T of attention mechanism after weighted transformation matrix of attention mechanism_ANode output T of attention mechanism at this time_AIs a d-dimensional vector; node output T of attention mechanism_AAfter passing through the following full-connection network of dX 2, the two-dimensional vector T is formed_FTwo-dimensional vector T_FIs the output vector of the fully connected network.

In step S304, in this example, by using the softmax function:the method comprises the steps that as an output layer activation function, the value of the activated two-dimensional vector elements is set to be larger 1, the value of the two-dimensional vector elements is set to be smaller 0, the judgment standard of the value of the two-dimensional vector elements to be larger and the value of the two-dimensional vector elements to be smaller is a preset threshold, and the preset threshold is a value threshold which is set and adjusted by a user in a self-defining mode according to actual requirements;further obtain the prediction label of the corresponding sample

More specifically, the input of the Softmax function is the output two-dimensional vector T of the fully-connected network in the previous stage_FZ in the formula is an output two-dimensional vector T_FWhere j is 1, …, K is the index of the elements of the two-dimensional vector Z, where K is equal to 2 in this example, assuming for example that the two-dimensional vector Z is [1,2 ]]Then Z is_jThat is, the jth element in the two-dimensional vector Z, that is, Z1 is 1, Z2 is 2, the two-dimensional vector Z is obtained by passing through the softmax function and then calculating as described in the formula, so as to obtain the result

Therefore, the result of passing the output two-dimensional vector Z of the gate cycle cell network through the softmax function can be obtained as [0.2691,0.7309 ]. Preferably, in this embodiment, the preset threshold of step S304 is set to 0.5, and if the preset threshold is exceeded by 1, and if the preset threshold is fallen below by 0, the prediction result is [0,1], so that the softmax function is used as a normalized value, and converts the element value in the original vector into a decimal number from 0 to 1, so as to reflect the probability, and facilitate the processing.

In step S305 in this example, a cross-entropy cost (cross-entropy) function is used:as a measure of the true label y and prediction of all samplesTo the error between.

In this example, aiming at reducing the loss value l (loss), in step S306, using a Batch Gradient Descent (BGD) algorithm set as a Batch parameter n: W⁺＝W^-- η δ β counter-propagate and implement repeated iterations, where L is a loss value representing an error function, W is a weight parameter of the depth mixture model, out can be considered as an output layer, i.e. a softmax function;b is expressed to calculate partial derivative on c, and a is the result of calculating partial derivative on c by b; thenCalculating a partial derivative of the weight parameter W by the loss value L;representing the partial derivative of the weighting parameter W, which can be considered as the output layer out;representing the loss value L versus what can be considered as the partial derivative of the output layer out,. delta.representing the derivative of softmax, β representing the derivative of the output layer versus the upper layer weight (the partial derivative of the weight is done in reverse, layer by layer, according to the chain rule, here simplified), and W⁺、W^-In the pre-training of step S3, the loss value l (loss) is used as an optimization target, the loss value l (loss) is updated by inverse gradient derivation (BGD algorithm is preferably adopted in this example), the parameter weight of the model is updated in the direction of reducing the loss value l (loss), the effect of fitting the training set is achieved by using η as the learning rate, and the precision is improved by repeating iteration to reduce the loss value.

All weight parameters of the initial state of the depth mixture model are assumed to be W0. For the sake of simplicityRepresenting modularizing weight parameters of a depth mixture model, using W_C、W_G、W_AAnd W_FThe initial weight parameters of the convolutional layer, the gate-round unit network (GRU network), the attention model, and the full link layer are respectively represented. After the depth mixture model is iterated m times,obtaining all the fixed parameter weights W of the depth mixed model in the source domain^X. Wherein,to iterate the weight parameters of the convolutional layer m times,to iterate the weight parameters of the network of gate loop units m times,to iterate the weight parameters of the attention model m times,the weight parameter of the fully connected layer after iteration m times is shown, and m is a natural number.

Step S3 in this example is a pre-training process for the source domain, and trains the model using the data set with a larger number of source domains to obtain the parameter weight of the model in the source domain.

It is worth mentioning that the model does not use the pooling layer after the convolution layer in the preprocessing; in other words, in the step S3, during pre-training, the source domain data sample set is input to the convolutional layer, then the pooling layer is skipped, and the source domain data sample set is directly input to the gate cycle unit network. The reason for this is that: during pre-training, due to the redundancy removing function of the pooling layer, the model loses the word sequence structure information obtained at the convolutional layer, the pre-training effect of the following GRU network is influenced, and meanwhile, the characteristic information which is learned in the source field and has a positive migration function is reduced, so that the pooling layer is not used in the source field pre-training. The GRU network is a GatedRecurrent Unit network, namely a gate cycle Unit network.

Step S4 in this example includes the following substeps:

step S402, pre-training the depth mixing weight W obtained in step S3^xLeading into a deep mixing model;

step S403, the preprocessed target domain data P obtained in the steps S1 and S2_SFine tuning is performed. P_SCarrying out data preprocessing (the data preprocessing comprises word segmentation, word mapping, word stop removal, symbol removal and the like) on the tagged data set P in the target field to obtain data; the reason for this is that: the target field tagged data set P (original data set P) cannot be directly used, and the target field tagged data set P has the function of fine tuning together with a model with weight parameters obtained by pre-training, so that the target field tagged data set P is combined with a specific fine tuning step to modify the weight parameters of the model obtained by pre-training, the model with the changed weight parameters can be better applied to the target field, and the model with better training effect in a source field with a larger data set can be moved to the target field with a small number of data sets, and a better classification effect can be obtained.

Wherein, in the step S403, the weight parameter of the gate cycle unit network is determinedFreezing, performing reverse iterative update for j times, and adding the weight parameter after iterative update of the convolutional layer, the weight parameter after iterative update of the attention model, the weight parameter after iterative update of the gate cycle unit network and the weight parameter after iterative update of the full connection layer to obtain all weight parameters W of the depth mixed model after fine adjustment in the target field^YTo achieve fine tuning.

That is, step S4 in this example is a fine adjustment step, and includes the following steps: step S401, repeating step S1 and step S2 for a preset number of target domain tagged data sets P (a small number of target domain tagged data sets P), adding a pooling layer after the model as shown in FIG. 4, and pre-training the deep mixing weight W obtained in step S3^SAnd introducing into a deep mixing model. P obtained in step S3_SFine tuning the data acquisition model, the fine tuning method is similar to the third step, but the weight parameters of the GRU network are adjustedFreezing is performed, i.e. no back-refreshing is performed. If the depth mixed model is updated j times in a reverse iteration mode in the fine adjustment mode, the depth mixed model can be obtainedAndall weight parameters W obtained by the depth mixing model after the fine adjustment of the target domain can be obtained^YExpressed as:wherein,to iterate the weight parameters of the convolutional layer j times later,to iterate the weight parameters of the gate loop element network j times later,to iterate the weight parameters of the attention model j times later,the weight parameter of the fully connected layer after j iterations, j being a natural number.

The fine tuning basic step of step S403 in this example is similar to the pre-training step of step S3, with the addition of a pooling layer compared to step S3, and the training data is exchanged from the source domain data set to the target domain data set. More specifically, the step S403 includes the following sub-steps:

step S4031, input target area data P_SCarrying out convolution operation;

step S4032, passing the output of the convolutional layer through the pooling layer; the pooling layer may be viewed as a simple step of extracting more important data, discarding less useful data, and the pooling layer does not contain a weight parameter;

step S4035, the value in the two-dimensional vector after activation exceeds the setting 1 of the preset threshold value and the value is smaller than the setting 0 of the preset threshold value through the activation function, and the prediction label of the corresponding sample is obtained

and step S4037, reversely propagating and realizing repeated iteration by using a batch gradient descent algorithm according to the reduction loss value L as an optimization direction so as to realize the pre-training of the deep hybrid model. Preferably, in the back propagation process of step S4037, the weight parameter of the gate cycle unit network is setFreezing is carried out.

That is, the step S4037 is basically the same as the step S306, and modifies the weight parameters of the deep hybrid model by using the back propagation method according to the reduction loss value L as the optimization direction, but modifies the weight parameters of the GRU networkAnd (5) freezing, namely not performing reverse updating, and realizing the pre-training of the deep mixing model.

In this example, in step S5, the overall parameters with the weight model obtained in step S4 are adjusted and optimized, and the adjustment and optimization of the hyperparameters after the transfer learning is helpful for improving the target domain classification effect and performing the emotion classification task in the target domain.

All the super parameters adopted in the above steps are preferably that the mapping dimension c of the word vector is 128, the maximum input length is h 60, the convolution kernel size is 128 × 5 × 128, the output dimension of the GRU network is d 256, the batch parameter is n 256, the learning rate is η -1 e-3, the pre-training iteration time is m 25, the fine tuning iteration time is j 15, and these value ranges are all preferred values, and can be adjusted and set according to actual needs in practical application.

As shown in fig. 4, this example further provides a comment emotion classification system based on deep hybrid model transfer learning, which adopts the comment emotion classification method based on deep hybrid model transfer learning as described above, and sequentially includes a convolutional layer, a pooling layer, a gate cycle unit network, an attention mechanism model, a dropout layer, a full connection layer, and an output layer.

That is, the example is actually a deep hybrid model based on a combination of convolutional neural networks, cyclic neural networks (using GRU networks), and attention mechanism, and the convolutional layers, the pooling layers, the gate-cyclic unit networks, the attention mechanism model, the dropout layer, the fully-connected layer, and the output layer are all network layers existing in the convolutional neural networks, the cyclic neural networks (using GRU networks), and the attention mechanism.

First, in this example, the underlying features are captured by using the feature recognition capability with a strong convolution structure, and the output after passing through the GRU is reweighed by combining with the weight transformation matrix of an AM (attentional model), so as to enhance the capability of the model to recognize the keywords in the comments.

Secondly, aiming at the problem that the convolution model is insensitive to the text word order structure, the learning capability and the context combining capability of the model to the text word order structure are enhanced by combining the GRU network, and the classification effect of longer commodity comment texts after transfer learning is improved.

Finally, a small-probability Dropout layer (a data layer which is discarded from the network temporarily for the neural network unit according to a certain probability in the training process of the deep learning network) is accessed to the full connection layer, a small part of nodes do not work (output is set to zero) randomly during training, overfitting is prevented, and the generalization capability of the model is enhanced.

Dropout can be considered as a randomly discarded layer, which refers to a data layer that is discarded from the network temporarily for the neural network unit according to a certain probability during the training process of the deep learning network, so that the overfitting of the model can be prevented (the overfitting can reduce the generalization capability, i.e., the migratable capability, of the model).

The training speed of the embodiment is high, the training difficulty is low, and higher classification precision can be obtained only by several rounds of training. The model can obtain a good classification effect when a data set with more noise or less noise is trained, and has small dependence on the data set and good robustness. Meanwhile, based on the transfer learning strategy, a good transfer learning effect can be achieved under the condition of a small amount of target domain data sets by combining the model. The method achieves the purposes of reducing the dependence of the model on the data set, improving the migratable capability of the model, improving the classification precision after the migration learning and the like.

Step S4 in this example is model fine tuning, which comprises the model obtained in step S3 and its weightThe heavy parameter transfer to the target domain is trained using the target domain small dataset. It should be noted that, in the fine tuning in step S4, the deep hybrid model is preferably directly connected to the pooling layer (maximum pooling layer) after the convolutional layer, and the weight parameters of the GRU network, that is, the weight parameters of the gate cycle unit network, are frozen at the same timeWeight parameters for convolutional layer, attention mechanism and fully connected layer only: (remains unchanged)And) And performing reverse optimization updating.

The reason why this transfer learning method is adopted in this example is that: in the pre-training of step S3, due to the redundancy elimination effect of the pooling layer, the deep hybrid model loses the language sequence structure information obtained at the convolutional layer, which affects the pre-training effect of the following GRU network, and reduces the feature information with positive migration effect learned by the model in the source domain, so the pooling layer is not used in the source domain pre-training; when tuning is performed in step S4, for the convolution structure and the AM structure, since it is expected that the model has better migratable capability and can be used in the domain with smaller feature similarity, the weight parameter is updated and optimized in the reverse direction, and the weight of the model for the feature with positive migration is increased to enhance the capability of capturing the target domain keyword.

For the GRU network, as the GRU network is a sequence model, the training of the model needs to be started again each time the task is updated, so that the back propagation speed of the GRU network is slow, and the speed of the whole transfer learning process is slow. Considering that the GRU network has been trained many times with a large data set on the source domain, has learned enough language features, and in order to increase the speed of the whole transfer learning process, the present example freezes the weights of the GRU network, and does not perform inverse update optimization on the weights, that is, the weight parameters of the gate cycle unit network in step S3Remain unchanged. Meanwhile, a pooling layer is used in the fine adjustment process of the step S4, the capability of the model convolution layer for capturing the keywords of the target domain is strengthened, and the characteristic information without positive migration effect is removed; meanwhile, the full connection layer is used as a final classifier of the model, and the reverse updating of the full connection layer can greatly improve the effect of model classification after the transfer learning.

The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims

1. A comment emotion classification method based on deep hybrid model transfer learning is characterized by comprising the following steps:

step S2, mapping the preprocessed data into word vectors;

wherein the step S3 includes the following substeps:

2. The method for classifying comment emotion based on deep hybrid model transfer learning of claim 1, wherein in step S304, the function is activatedComputing a value σ (Z) for computing a two-dimensional vector as an output layer activation function_jWherein Z is_jJ is the element index of the two-dimensional vector Z, and K is 2.

3. The method for classifying comment emotion based on deep hybrid model transfer learning of claim 1, wherein in step S305, the comment emotion is classified by a cross entropy cost functionComputing emotion polarity label y and prediction labelWith a loss value L in between.

4. The method for classifying comment emotions based on deep hybrid model transfer learning according to any one of claims 1 to 3, wherein the step S4 comprises the following sub-steps:

5. The method for classifying comment emotion based on deep hybrid model transfer learning of claim 4, wherein in step S403, the weight parameter of the gate cycle unit network is subjected toFreezing, performing reverse iterative update for j times, and adding the weight parameter after iterative update of the convolutional layer, the weight parameter after iterative update of the attention model, the weight parameter after iterative update of the gate cycle unit network and the weight parameter after iterative update of the full connection layer to obtain all weight parameters W of the depth mixed model after fine adjustment in the target field^YTo practiceNow fine tuning is performed.

6. The method for classifying comment emotion based on deep hybrid model transfer learning according to claim 4, wherein the step S403 includes the following sub-steps:

step S4031, input target area data P_SCarrying out convolution operation;

7. The method for classifying comment emotion based on deep hybrid model transfer learning of claim 6, wherein in the back propagation process of step S4037, the weighting parameters of the gate cycle unit network are subjected toFreezing is carried out.

8. The method for classifying comment emotions based on deep hybrid model transfer learning according to any one of claims 1 to 3, wherein in the step S3, during pre-training, the source domain data sample set is input into the convolutional layer, then the pooling layer is skipped, and the source domain data sample set is directly input into the gate cycle unit network.

9. A comment emotion classification system based on deep hybrid model transfer learning is characterized in that the comment emotion classification method based on deep hybrid model transfer learning of any one of claims 1 to 8 is adopted, and the comment emotion classification system sequentially comprises a convolution layer, a pooling layer, a gate cycle unit network, an attention mechanism model, a random discarding layer, a full connection layer and an output layer.