CN114090725A - Emotion prediction model training method and device - Google Patents

Emotion prediction model training method and device Download PDF

Info

Publication number
CN114090725A
CN114090725A CN202010854743.0A CN202010854743A CN114090725A CN 114090725 A CN114090725 A CN 114090725A CN 202010854743 A CN202010854743 A CN 202010854743A CN 114090725 A CN114090725 A CN 114090725A
Authority
CN
China
Prior art keywords
emotion
corpus
emotion prediction
task
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010854743.0A
Other languages
Chinese (zh)
Inventor
周杰
肖文明
田俊峰
王睿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN202010854743.0A priority Critical patent/CN114090725A/en
Publication of CN114090725A publication Critical patent/CN114090725A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

A method and a device for training an emotion prediction model are disclosed. The training method comprises the following steps: acquiring a plurality of unsupervised corpora; covering at least one emotion expression in the unsupervised corpus, and inputting the corpus before covering and the corpus after covering as training samples to an emotion prediction model; wherein the emotion prediction model is used for: and performing a word prediction task and a word emotion prediction task according to the covered emotion expressions in the training sample, calculating a task error according to a word prediction task result, a word emotion prediction task result and the corpus before covering, and correcting a weight coefficient of the emotion prediction model based on the task error. According to the method, the emotion expression is covered to construct the training sample, and the emotion prediction model to be trained is trained by using the training sample, so that the accuracy of an emotion prediction analysis task can be improved, the stability of the model can be improved, and the number of the training samples is greatly reduced.

Description

Emotion prediction model training method and device
Technical Field
The disclosure relates to the field of natural language processing, in particular to a method and a device for training an emotion prediction model.
Background
In recent years, pre-trained language models are widely applied to multi-field emotion analysis and achieve good effects. However, due to the differences in the user's expressive emotions across different domains, fine-tuning the model in the source domain is easily over-fitted, resulting in poor performance in the target domain when migrating the model from the source domain to the target domain. And because the parameters of the pre-training model are larger, fine tuning consumes more consumption in training speed and space, and the requirement on the number of training samples is higher.
Disclosure of Invention
In view of this, an object of the present disclosure is to provide a method and an apparatus for training an emotion prediction model, so that when an obtained pre-training model migrates from a source domain to a target domain, a weight parameter may be adjusted without adjusting or with only few resources.
In a first aspect, an embodiment of the present disclosure provides a method for training an emotion prediction model, including:
acquiring a plurality of unsupervised corpora;
covering at least one emotion expression in the unsupervised corpus, and inputting the corpus before covering and the corpus after covering as training samples to an emotion prediction model;
wherein the emotion prediction model is used for: and performing a word prediction task and a word emotion prediction task according to the covered emotion expressions in the training sample, calculating a task error according to a word prediction task result, a word emotion prediction task result and the corpus before covering, and correcting a weight coefficient of the emotion prediction model based on the task error.
Optionally, the emotional expression comprises: at least one of emotion words and emoticons which are irrelevant to the field; in the unsupervised corpus, the probability that the emotional expressions are masked is higher than the non-emotional expressions.
Optionally, the masking operation comprises:
replacing the masked content with special characters; and/or the presence of a gas in the gas,
replacing the masked content with other words.
Optionally, the emotion prediction model is further configured to:
carrying out sentence-level emotion prediction task aiming at the masked corpus; and
and calculating a task error according to a sentence-level emotion prediction task result and the linguistic data before covering, and correcting a weight coefficient of the emotion prediction model based on the task error.
Optionally, the unsupervised corpus includes comment scoring, and performing a sentence-level emotion prediction task on the masked corpus includes: and performing sentence-level emotion prediction task according to the comment scoring and the masked emotion information of the corpus.
Optionally, the emotion prediction model is further configured to: and obtaining the emotion prediction result of the unsupervised corpus according to the sentence-level emotion prediction task result and the word emotion prediction task result.
Optionally, in the masking operation, the emotion words are determined in the unsupervised corpus using a general emotion dictionary, and/or the emoticons are determined in the unsupervised corpus using a regular expression.
In a second aspect, an embodiment of the present disclosure provides an emotion prediction model training apparatus, including:
the data acquisition module is used for acquiring a plurality of unsupervised corpora;
the sample preparation module is used for covering at least one emotion expression in the unsupervised corpus and inputting the corpus before covering and the corpus after covering as training samples to the emotion prediction model;
a model training module, configured to train the emotion prediction model using the training samples, where the emotion prediction model is configured to: and performing a word prediction task and a word emotion prediction task according to the covered emotion expressions in the training sample, calculating a task error according to a word prediction task result, a word emotion prediction task result and the corpus before covering, and correcting a weight coefficient of the emotion prediction model based on the task error.
Optionally, the emotional expression comprises: at least one of emotion words and emoticons which are irrelevant to the field; in the unsupervised corpus, the probability that the emotional expressions are masked is higher than the non-emotional expressions.
Optionally, the masking operation comprises:
replacing the masked content with special characters; and/or the presence of a gas in the gas,
replacing the masked content with other words.
Optionally, the emotion prediction model is further configured to:
carrying out sentence-level emotion prediction task aiming at the masked corpus; and
and calculating a task error according to a sentence-level emotion prediction task result and the linguistic data before covering, and correcting a weight coefficient of the emotion prediction model based on the task error.
Optionally, the unsupervised corpus includes comment scoring, and performing a sentence-level emotion prediction task on the masked corpus includes: and performing sentence-level emotion prediction task according to the comment scoring and the masked emotion information of the corpus.
Optionally, the emotion prediction model is further configured to: and obtaining the emotion prediction result of the unsupervised corpus according to the sentence-level emotion prediction task result and the word emotion prediction task result.
Optionally, in the masking operation, the emotion words are determined in the unsupervised corpus using a general emotion dictionary, and/or the emoticons are determined in the unsupervised corpus using a regular expression.
In a third aspect, an embodiment of the present disclosure provides an electronic device, including a memory and a processor, where the memory further stores computer instructions executable by the processor, and the computer instructions, when executed, implement the training method described in any one of the above.
In a fourth aspect, embodiments of the present disclosure provide a computer-readable medium storage storing computer instructions executable by an electronic device, the computer instructions, when executed, implementing the training method of any one of the above.
According to the method and the device for training the emotion prediction model, the emotion expression is covered to construct the training sample, the training sample is used for training the emotion prediction model to be trained, the accuracy of an emotion prediction analysis task can be improved, the stability of the model can be improved, and the number of the training samples is greatly reduced.
Further, at least one of the masked emotion expressions in the unsupervised corpus is a domain-independent emotion word and/or emoticon, so that when the trained emotion prediction model is migrated from the source domain to the target domain, the weight coefficient does not need to be adjusted or is adjusted very little, and thus the trained emotion prediction model can be applied to emotion prediction tasks of multiple domains.
Drawings
The foregoing and other objects, features, and advantages of the disclosure will be apparent from the following description of embodiments of the disclosure, which refers to the accompanying drawings in which:
FIG. 1 is a flow diagram of a method of training an emotion prediction model in an embodiment of the present disclosure;
FIG. 2 is a comparison example of two unsupervised corpora for the computer domain and the catering domain;
fig. 3 is a schematic diagram of a network structure of a transform encoder. (ii) a
FIG. 4 is an exemplary diagram of emotion prediction using an emotion prediction model;
FIG. 5 is a schematic structural diagram of an emotion prediction model training apparatus according to an embodiment of the present disclosure;
FIG. 6 illustrates score values and average scores for different models migrating from a source domain to a target domain;
FIG. 7 shows a comparative example of the time-space and convergence of the present solution and other models;
8a-8c show linear contrast plots of the correlation between the number of characterized samples and the model accuracy when the present solution and other models undergo multiple different domain migrations, respectively;
FIG. 9 shows a block diagram of an electronic device for implementing embodiments of the present disclosure.
Detailed Description
The present disclosure is described below based on examples, but the present disclosure is not limited to only these examples. In the following detailed description of the present disclosure, some specific details are set forth in detail. It will be apparent to those skilled in the art that the present disclosure may be practiced without these specific details. Well-known methods, procedures, and procedures have not been described in detail so as not to obscure the present disclosure. The figures are not necessarily drawn to scale.
The following terms are used herein.
A Multi-domain Sentiment Analysis System (Multi-domain Sentiment Analysis System) understands the Sentiment polarity of a user given the user's text. The task is characterized by multiple fields, the text comprises multiple fields including a source field and a target field, the source field usually comprises a large amount of label data, and the target field only has a small amount of label data or even no label data. Therefore, a model needs to be trained in the source domain, and emotion knowledge is migrated to the target domain for emotion analysis.
Pre-trained Models (Pre-trained Models): it is well known that model training typically requires a large amount of resources, including but not limited to a large amount of sample data that needs to be labeled and computer resources to perform the training, and therefore model training is not easy to perform. While the pre-trained model provides a better set of weight parameter values. Research personnel can directly apply the weight parameter value to an actual scene, or before the weight parameter value is applied to the actual scene, only an input layer and an output layer need to be simply modified, incremental training is carried out by using own data, and the weight parameter value is finely adjusted. The pre-training model referred to herein refers to a pre-training language model that is specific to the field of natural language processing.
Fine-tune (Fine-tune): the pre-training model is used for a specific task, and the weight coefficient of the model is updated and adjusted according to the supervised data of the specific task, so that the pre-training model is better adapted to the specific task.
Overfitting (Overfitting): because the expression of emotion is greatly different between the source domain and the target domain, training in the source domain may result in overfitting of the model due to excessive learning domain knowledge, and thus may not perform well in the target domain.
FIG. 1 is a flowchart of a method for training an emotion prediction model according to an embodiment of the present disclosure.
As shown in fig. 1, the flowchart includes the following steps.
In step S101, a plurality of unsupervised corpora are acquired.
In step S102, at least one emotion expression in the unsupervised corpus is masked, and the corpus before masking and the corpus after masking are input to the emotion prediction model as training samples.
In step S103, an emotion prediction model is trained using the training samples.
Referring to fig. 1, the unsupervised corpora in the method can be from a large number of different fields, for example, can be collected from databases of application platform software in multiple fields, and it is noted that text data expressing rich emotion should be collected, for example, on an e-commerce platform, there are often customer reviews of goods, which often reflect the likes and dislikes of the customers and scores for the goods, and thus, the unsupervised corpora are very suitable for being used as unsupervised corpora. After the unsupervised corpus is obtained, locating emotion expressions in the unsupervised corpus, wherein the emotion expressions comprise emotion words and emoticons, for example, a general emotion dictionary can be used as an emotion knowledge base, the emotion words in the unsupervised corpus can be located based on the emotion knowledge base, and meanwhile regular expressions are used for locating the emoticons in the unsupervised corpus. Thereafter, at least one sentiment expression is masked out using a masking operation. And inputting the linguistic data after the covering operation and the linguistic data before the covering operation as training samples into an emotion prediction model to be trained for training so as to obtain the trained emotion prediction model.
FIG. 2 is a comparative example of two unsupervised corpora in the computer domain and the catering domain. The embodiment of fig. 1 is exemplarily illustrated by the example of fig. 2. Referring now to the drawings, reference numeral 201 denotes a computer fieldAn example of unsupervised corpus, and reference 202 represents an example of unsupervised corpus in the restaurant area. Here, for convenience of presentation, the emotion expressions are marked by underlining, and as can be seen from the figure, the emotion words and emoticons included in the corpus 201 are ('fast',
Figure BDA0002646030110000051
'bad', 'ver'), the emotion words and emoticons included in the corpus 202 are ('beautiful',
Figure BDA0002646030110000052
'fast', 'delicious'). Based on the training method shown in fig. 1, after the emotion words and emoticons in the corpora 201 and 202 are masked, the pre-masked 201 and the masked 201 are used as a training sample, and the pre-masked 202 and the masked 202 are used as another training sample, and are provided for the emotion prediction model to be trained for training.
In the embodiment, the emotion expression is covered to construct the training sample, and the emotion prediction model to be trained is trained by using the training sample, so that the accuracy of the emotion prediction analysis task can be improved, the stability of the model can be improved, and the number of the training samples is greatly reduced.
With continued reference to fig. 2, the emotion words and emoticons of the corpora 201 and 202 have only 'fast' associated with the domain, for example, when the adjective person runs fast, the emotion expressed by 'fast' is positive, but the adjective consumes fast power, the emotion words are negative, and thus 'fast' is the emotion words associated with the domain. For unsupervised corpus 201, domain-independent emotional words and emoticons are (
Figure BDA0002646030110000061
'bad', 'neger'), emoticons are generally always domain independent. For unsupervised corpus 202, domain-independent emotion words and emoticons are ('beautiful',
Figure BDA0002646030110000062
'delicious'). If the expressed emotion is classified as positive, negative and neutral: (
Figure BDA0002646030110000063
The emotion classification of each word of 'bad', 'ver') is negative, whereas ('beautiful',
Figure BDA0002646030110000064
'delicious') is positive for each word.
Based on this, as a further embodiment, in order to train out a domain-independent emotion prediction model, masking operations are performed on the emotion expressions of the unsupervised corpus, for example, only domain-independent emotion words and emoticons in corpus 201 ('beautiful',
Figure BDA0002646030110000065
'delicious') is applied with masking operations while 'fast' is not applied, or in a large number of training samples, a large portion of the domain-independent affective words and emoticons in the training samples are applied with masking operations while only a small portion of the domain-dependent affective words in the training samples are applied with masking operations.
It should be noted that the masking operation may be applied not only to the emotional words and/or emoticons, but also to neutral words other than the emotional words and emoticons, so as to improve the accuracy of the word prediction task, and for the same unsupervised corpus, the masking operation may be performed on a plurality of different words to obtain a plurality of training samples.
Further, in the random masking operation, the domain-independent emotion words and/or emoticons are set to be masked with a greater probability in the plurality of unsupervised corpora than the domain-dependent emotion words or neutral words.
Fig. 3 is a schematic diagram of a network structure of a transform encoder.
Referring to the figure, input data of the encoder 300 is a word vector 311 corresponding to each word of text data and a position encoding vector 312 derived from position information of each word. The position-coding vector 312 is generated based on the position information of each word in the text data. Various position coding algorithms are available in the prior art and are not described here.
As shown in the figure, the encoder 300 is composed of two sublayers. The first sub-layer comprises a self-attention layer 301 for self-attention operation and the second sub-layer comprises a fully-connected feed-forward network 303. Residual Connection (Residual Connection) and normalization (corresponding to normalization layer 302 and normalization layer 304) are used between each two sublayers. The output of each sublayer can be expressed as: LayerNorm (x + Sublayer (x)), Sublayer (x) is a function of the implementation of the self-attention layer 301 and the fully-connected feedforward network 303, respectively. The function implemented by the self-attention layer 301 is as follows:
X*wQ=Q (1)
X*wK=K (2)
X*wV=V (3)
wherein X represents an input matrix, a weight matrix wQ,wK,wVRepresenting the weight coefficients obtained via training, Q, K and V represent the Query matrix (Query), Key matrix (Key), and Value matrix (Value).
The output is then obtained using equation (4) and is provided to the summing and normalization layer 302.
Figure BDA0002646030110000071
Figure BDA0002646030110000072
Denotes the square root of the dimension of the key vector and T denotes the transpose of the matrix.
The fully-connected feed-forward network 303 includes two linear exchanges and then uses the function ReLU as an activation function.
FFN(X)=Relu(XW1+b1)W2+b2 (5)
X is the output matrix, W1And W2Is obtained by trainingWeight coefficient, b1And b2Is a bias parameter. Of course, the above formula may be adjusted according to the actual network structure, and the disclosure is not limited thereto.
The network structure formed by the superposition of one or more encoders 300 can be used for a variety of natural language processing tasks, such as speech summarization, text translation, keyword extraction, emotion prediction, and the like, after training. And the disclosed embodiment may utilize the encoder 300 to construct the emotion prediction model to be trained. When constructed, the emotion prediction model may be composed of an input layer, one or more encoders 300, and an output layer. The input layer functions to receive the unsupervised corpus after the masking operation and the unsupervised corpus before the masking operation, and construct a word vector and a position-encoded vector from each word and output them to one or more encoders 300. The role of the one or more encoders 300 is to perform a series of calculations on the input vector and obtain an output vector in another dimension. The output layer is used for determining word prediction corresponding to the covered words and word emotion prediction obtained by the word prediction according to the output vector.
The emotion prediction model to be trained can determine the weight coefficient only through repeated training, namely training samples are continuously input into the emotion prediction model to be trained, a word prediction task and a word emotion prediction task of a covered word are executed by the emotion prediction model to be trained, and then the weight coefficient is corrected according to an error back propagation algorithm of a feedforward neural network. Because the training sample comprises the unsupervised linguistic data before the covering operation and the unsupervised linguistic data after the covering operation, the unsupervised linguistic data before the covering operation and the unsupervised linguistic data after the covering operation are subjected to mathematical calculation of all layers of the encoder, the task error is obtained, and the weight coefficient is corrected based on the task error according to the error back propagation algorithm.
It should be emphasized that the above embodiments are directed to word-level word prediction and word emotion prediction, and the unsupervised corpus input to the encoder is converted into word vectors and position-encoded vectors, so that the resulting model is directed to word-level emotion prediction. Alternatively, in the emotion prediction model to be trained, sentence-level emotion prediction results are generalized from word-level emotion prediction results, for example, if an emoticon of a smiling face is included in the end of a sentence, the sentence is classified as positive, and for example, if a plurality of emotion words are included in the sentence, if more than a certain proportion of the emotion words all represent 'positive' emotion, the emotion prediction of the sentence can be determined as 'positive'.
Furthermore, the emotion prediction model can be used for sentence-level emotion prediction tasks, such as performing sentence-level emotion prediction tasks on unsupervised corpora after masking operations, calculating task errors according to the sentence-level emotion prediction task results and unsupervised corpora before masking operations, and correcting the weight coefficients of the emotion prediction model based on the task errors, in addition to word prediction and word emotion prediction at the word level above.
For the emotion prediction model, sentence-level emotion prediction and word-level emotion prediction can be set to be performed independently, for example, one part of the encoder is dedicated to sentence-level emotion prediction tasks, and the other part of the encoder is dedicated to word-level emotion prediction tasks. The word-level emotion prediction result and the sentence-level emotion prediction result may be verified against each other, for example, the sentence-level emotion prediction result is summarized according to the word-level emotion prediction result, and the sentence-level emotion prediction result directly obtained is verified according to the summarized sentence-level emotion prediction result, or vice versa.
FIG. 4 is an exemplary diagram of emotion prediction using an emotion prediction model. The unsupervised corpus includes text contents of 'fast and cheerful service, food waters pretety good' and a comment scoring. Firstly, according to the above embodiment, determining 'cheerful' and 'good' as the emotional words which are not related to the field will be carried out
Figure BDA0002646030110000091
Is an emoticon, so the masking operation masks the emotional words and the emoticon. The text after the masking operation and the text before the masking operation are simultaneously input to emotion prediction model 400. Emotion prediction model 400 may be constructed using an encoder as shown in FIG. 3 to generate output vectors x1-x11 in corresponding singlesWord position output emotion prediction, in which 'cheerful' and 'good' both express 'positive' emotions, emoticons
Figure BDA0002646030110000092
Also denoted 'positive' emotion. Meanwhile, the emotion prediction model 400 also outputs five-star labels for the entire text content 'fast and chemical service, food way prediction good' and rating. Rating represents the scoring value contained in the original text content, for example, the comment of the customer on the commodity on the e-commerce platform can be used as unsupervised corpus, in which case, the unsupervised corpus would typically include the scoring value of the customer on the commodity.
It is noted that although the above examples all use english, chinese examples are also possible, and the emotional words include one or more chinese characters.
Corresponding to the above method, the embodiment of the present disclosure further provides an emotion prediction model training apparatus, which includes, as shown in fig. 5, a data obtaining module 501, a sample preparation module 502, and a model training module 503. The data obtaining module 501 is configured to collect a plurality of unsupervised corpora. The sample preparation module 502 is configured to mask at least one emotion expression in the unsupervised corpus, and input the corpus before masking and the corpus after masking as training samples to the emotion prediction model. The model training module 503 is configured to train an emotion prediction model to be trained by using a training sample, where a training process of the emotion prediction model to be trained is as follows: and performing a word prediction task and a word emotion prediction task according to the covered emotion expressions in the training sample, calculating a task error according to a word prediction task result, a word emotion prediction task result and the corpus before covering, and correcting a weight coefficient of the emotion prediction model based on the task error.
In some embodiments, a random masking operation is performed on the unsupervised corpus, but the domain-independent emotion words and/or emoticons are set to be masked with a greater probability than other words (including neutral words and domain-related emotion words). The masking operation includes: replacing the masked content with special characters; and/or replacing the masked content with other words.
In some embodiments, the emotion prediction model to be trained is further used for performing a sentence-level emotion prediction task on the masked corpus; and calculating a task error according to the sentence-level emotion prediction task result and the corpus before covering, and correcting the weight coefficient of the emotion prediction model based on the task error. Sentence level emotion prediction and word level emotion prediction can be performed independently in the model or can be performed alternately.
In some embodiments, the emotion prediction model to be trained is further configured to: and connecting the sentence-level emotion prediction result and the word emotion prediction result, and obtaining a final emotion prediction result of the unsupervised corpus according to the sentence-level emotion prediction result and the word emotion prediction result.
In summary, the method and the device for training the emotion prediction model provided by the embodiment of the disclosure construct the training samples based on the masking operation, and train the emotion prediction model to be trained by using the training samples, so that not only the accuracy of the emotion prediction analysis task can be improved, but also the stability of the model can be improved, and the number of the training samples can be greatly reduced.
FIG. 6 shows score values and average scores for different models migrating from a source domain to a target domain. As shown, the model BERT-DAAT overall score is 90.12. This scheme (SentiX)Fix) Compared with other models in different fields, the method is far beyond the comparison model in all fields. SentiXFixIs an improved scheme based on the SentiX of the disclosed embodiment. The method also provides a uniform interface for the emotion analysis task, and a model does not need to be trained for different fields. Meanwhile, by the training mode of the scheme, the accuracy of the model is guaranteed, meanwhile, the training time and space of the cross-domain task can be greatly reduced, and the convergence of the model is accelerated.
Table 7 gives a comparative example of the time-space and convergence properties of this scheme and other models. As can be seen from the figure, this scheme (SentiX)Fix) Can be converged quickly, and meanwhile, only 2k of training parameters are needed, but 133M of training parameters are needed by the base model before improvement, SentiX,meanwhile, the running speed of the scheme is three times of that of a basic model SentiX.
8a-8c show linear contrast plots of the correlation between the number of characterized samples and the model accuracy when the present solution and other models undergo multiple different domain migrations, respectively. As shown on the figure, the abscissa represents the number of trainings and the ordinate represents the model accuracy. As can be seen from the figure, this scheme (SentiX)Fix) In the domain migration of domains B to E, D to B and E to D, a good effect can be obtained even in the case where the number of samples is small. B denotes a book, E denotes electronic information, and D denotes a DVD.
The disclosed embodiment also provides an electronic device 90, as shown in fig. 9, which includes a memory 902 and a processor 901 on a hardware level, and in addition, an input/output device 903 and other hardware 904 in some cases. The Memory 902 is, for example, a Random-Access Memory (RAM), and may also be a non-volatile Memory (non-volatile Memory), such as at least 1 disk Memory. The input/output device 903 is, for example, a display, a keyboard, a mouse, a network controller, or the like. The processor 901 may be constructed based on various models of processors currently on the market. The processor 901, the memory 902, the input/output device 903, and other hardware 904 are connected to each other via a bus, which may be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one line is shown in FIG. 9, but this does not represent only one bus or one type of bus.
The memory 902 is used for storing programs. In particular, the program may comprise program code comprising computer instructions. The memory may include both memory and non-volatile storage and provides computer instructions and data to the processor 901. The processor 901 reads a corresponding computer program from the storage 902 to the memory and then runs the computer program, thereby implementing the model pre-training method of the above embodiment in the logic level.
As will be appreciated by one skilled in the art, the present disclosure may be embodied as systems, methods and computer program products. Accordingly, the present disclosure may be embodied in the form of entirely hardware, entirely software (including firmware, resident software, micro-code), or in the form of a combination of software and hardware. Furthermore, in some embodiments, the present disclosure may also be embodied in the form of a computer program product in one or more computer-readable media having computer-readable program code embodied therein.
Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium is, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer-readable storage medium include: an electrical connection for the particular wire or wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical memory, a magnetic memory, or any suitable combination of the foregoing. In this context, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with a processing unit, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a chopper. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any other suitable combination. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., and any suitable combination of the foregoing.
Computer program code for carrying out embodiments of the present disclosure may be written in one or more programming languages or combinations. The programming language includes an object-oriented programming language such as JAVA, C + +, and may also include a conventional procedural programming language such as C. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims (16)

1. A method for training an emotion prediction model comprises the following steps:
acquiring a plurality of unsupervised corpora;
covering at least one emotion expression in the unsupervised corpus, and inputting the corpus before covering and the corpus after covering as training samples to an emotion prediction model;
wherein the emotion prediction model is used for: and performing a word prediction task and a word emotion prediction task according to the covered emotion expressions in the training sample, calculating a task error according to a word prediction task result, a word emotion prediction task result and the corpus before covering, and correcting a weight coefficient of the emotion prediction model based on the task error.
2. The training method of claim 1, wherein the emotion expression comprises: at least one of emotion words and emoticons which are irrelevant to the field; in the unsupervised corpus, the probability that the emotional expressions are masked is higher than the non-emotional expressions.
3. The training method of claim 1, wherein the masking operation comprises:
replacing the masked content with special characters; and/or the presence of a gas in the gas,
replacing the masked content with other words.
4. The training method of claim 1, wherein the emotion prediction model is further configured to:
carrying out sentence-level emotion prediction task aiming at the masked corpus; and
and calculating a task error according to a sentence-level emotion prediction task result and the linguistic data before covering, and correcting a weight coefficient of the emotion prediction model based on the task error.
5. The training method of claim 4, wherein the unsupervised corpus comprises comment scoring, and wherein performing a sentence-level emotion prediction task on the masked corpus comprises: and performing sentence-level emotion prediction task according to the comment scoring and the masked emotion information of the corpus.
6. The training method of claim 4, wherein the emotion prediction model is further configured to: and obtaining the emotion prediction result of the unsupervised corpus according to the sentence-level emotion prediction task result and the word emotion prediction task result.
7. The training method according to claim 2, wherein in the masking operation, the emotion words are determined in the unsupervised corpus using a general emotion dictionary, and/or the emoticons are determined in the unsupervised corpus using a regular expression.
8. An emotion prediction model training device comprises:
the data acquisition module is used for acquiring a plurality of unsupervised corpora;
the sample preparation module is used for covering at least one emotion expression in the unsupervised corpus and inputting the corpus before covering and the corpus after covering as training samples to the emotion prediction model;
a model training module, configured to train the emotion prediction model using the training samples, where the emotion prediction model is configured to: and performing a word prediction task and a word emotion prediction task according to the covered emotion expressions in the training sample, calculating a task error according to a word prediction task result, a word emotion prediction task result and the corpus before covering, and correcting a weight coefficient of the emotion prediction model based on the task error.
9. The training device of claim 8, wherein the emotion expression comprises: at least one of emotion words and emoticons which are irrelevant to the field; in the unsupervised corpus, the probability that the emotional expressions are masked is higher than the non-emotional expressions.
10. The training device of claim 8, wherein the masking operation comprises:
replacing the masked content with special characters; and/or the presence of a gas in the gas,
replacing the masked content with other words.
11. The training device of claim 8, wherein the emotion prediction model is further configured to:
carrying out sentence-level emotion prediction task aiming at the masked corpus; and
and calculating a task error according to a sentence-level emotion prediction task result and the linguistic data before covering, and correcting a weight coefficient of the emotion prediction model based on the task error.
12. The training device of claim 11, wherein the unsupervised corpus comprises comment scoring, and wherein performing a sentence-level emotion prediction task on the masked corpus comprises: and performing sentence-level emotion prediction task according to the comment scoring and the masked emotion information of the corpus.
13. The training apparatus of claim 11, wherein the emotion prediction model is further configured to: and obtaining the emotion prediction result of the unsupervised corpus according to the sentence-level emotion prediction task result and the word emotion prediction task result.
14. The training apparatus according to claim 9, wherein in the masking operation, the emotion words are determined in the unsupervised corpus using a general emotion dictionary, and/or the emoticons are determined in the unsupervised corpus using a regular expression.
15. An electronic device comprising a memory and a processor, the memory further storing computer instructions executable by the processor, the computer instructions, when executed, implementing the training method of any one of claims 1 to 7.
16. A computer readable medium storing computer instructions executable by an electronic device, the computer instructions, when executed, implementing the training method of any one of claims 1 to 7.
CN202010854743.0A 2020-08-24 2020-08-24 Emotion prediction model training method and device Pending CN114090725A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010854743.0A CN114090725A (en) 2020-08-24 2020-08-24 Emotion prediction model training method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010854743.0A CN114090725A (en) 2020-08-24 2020-08-24 Emotion prediction model training method and device

Publications (1)

Publication Number Publication Date
CN114090725A true CN114090725A (en) 2022-02-25

Family

ID=80295321

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010854743.0A Pending CN114090725A (en) 2020-08-24 2020-08-24 Emotion prediction model training method and device

Country Status (1)

Country Link
CN (1) CN114090725A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117150024A (en) * 2023-10-27 2023-12-01 北京电子科技学院 Cross-domain fine granularity emotion analysis method, system, equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117150024A (en) * 2023-10-27 2023-12-01 北京电子科技学院 Cross-domain fine granularity emotion analysis method, system, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111444709B (en) Text classification method, device, storage medium and equipment
US11501182B2 (en) Method and apparatus for generating model
Arora et al. Character level embedding with deep convolutional neural network for text normalization of unstructured data for Twitter sentiment analysis
CN107783960B (en) Method, device and equipment for extracting information
US11423233B2 (en) On-device projection neural networks for natural language understanding
CN112528672B (en) Aspect-level emotion analysis method and device based on graph convolution neural network
CN112131366B (en) Method, device and storage medium for training text classification model and text classification
Dashtipour et al. Exploiting deep learning for Persian sentiment analysis
Korpusik et al. Spoken language understanding for a nutrition dialogue system
Zhang et al. The optimally designed dynamic memory networks for targeted sentiment classification
Onan SRL-ACO: A text augmentation framework based on semantic role labeling and ant colony optimization
Noaman et al. Enhancing recurrent neural network-based language models by word tokenization
Zhou et al. Deep personalized medical recommendations based on the integration of rating features and review sentiment analysis
WO2022228127A1 (en) Element text processing method and apparatus, electronic device, and storage medium
Liu et al. Cross-domain slot filling as machine reading comprehension: A new perspective
CN111723583B (en) Statement processing method, device, equipment and storage medium based on intention role
Wang et al. Information-enhanced hierarchical self-attention network for multiturn dialog generation
CN113705207A (en) Grammar error recognition method and device
Kumari et al. Reinforced nmt for sentiment and content preservation in low-resource scenario
CN114090725A (en) Emotion prediction model training method and device
Ling Coronavirus public sentiment analysis with BERT deep learning
Karpagam et al. Deep learning approaches for answer selection in question answering system for conversation agents
CN111680151B (en) Personalized commodity comment abstract generation method based on hierarchical transformer
Octavany et al. Cleveree: an artificially intelligent web service for Jacob voice chatbot
CN113553855A (en) Viewpoint role labeling method and device, computer equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination