CN112257452A

CN112257452A - Emotion recognition model training method, device, equipment and storage medium

Info

Publication number: CN112257452A
Application number: CN202010997634.4A
Authority: CN
Inventors: 唐新春
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2020-09-21
Filing date: 2020-09-21
Publication date: 2021-01-22
Anticipated expiration: 2040-09-21
Also published as: CN112257452B

Abstract

The present disclosure relates to a method, an apparatus, a device and a storage medium for training an emotion recognition model, and relates to the technical field of data processing, wherein the method for training the emotion recognition model comprises the following steps: obtaining sample comment information with a label; segmenting the sample comment information to generate a plurality of words; respectively acquiring index values, mask values and text numbers of a plurality of words, and generating coding vectors of the plurality of words according to the index values, mask values and text numbers corresponding to the plurality of word pairs; generating word vectors of a plurality of characters according to the coding vectors of the plurality of characters, and generating sentence vectors of the sample comment information according to the word vectors of the plurality of characters; generating a prediction label of the sample comment information according to the sentence vector; and training the emotion recognition model according to the labeling label and the prediction label. Therefore, the emotion recognition model can be trained, so that the comment information is processed through the emotion recognition model, manual review and processing of the comment information are not needed, and the comment information processing efficiency and accuracy are improved.

Description

Emotion recognition model training method, device, equipment and storage medium

Technical Field

The present disclosure relates to the field of data processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for training an emotion recognition model.

Background

At present, in the information flow advertisement, an advertisement recommendation system can recommend a corresponding video advertisement to a user, and in the attribute of the video advertisement, the user comment is taken as implicit feedback behavior data of the user, so that the method has an important role in evaluating the user satisfaction of the advertisement recommendation system.

Specifically, the comments of the user in the video advertisement can express the emotion and love of the user, and if the user likes the advertisement, the user can appreciate the advertisement in the comments; and if the user feels the advertisement, the advertisement is expressed through the comment.

In the related technology, comment information of a user under an advertisement is manually collected and is audited and processed, so that the comment information processing efficiency and accuracy are low.

Disclosure of Invention

The invention provides a method and a device for training an emotion recognition model, which are used for at least solving the problem that comment information processing efficiency and accuracy are low in the related technology. The technical scheme of the disclosure is as follows:

according to a first aspect of the embodiments of the present disclosure, there is provided a method for training an emotion recognition model, including:

obtaining sample comment information, wherein the sample comment information is provided with a labeling label;

performing word segmentation on the sample comment information to generate a plurality of words;

respectively acquiring the index values, mask values and text numbers of the words, and generating coding vectors of the words according to the index values, mask values and text numbers corresponding to the word pairs;

generating word vectors of the multiple words according to the coding vectors of the multiple words, and generating sentence vectors of the sample comment information according to the word vectors of the multiple words;

generating a prediction label of the sample comment information according to the sentence vector; and

and training an emotion recognition model according to the labeling label and the prediction label.

In an embodiment of the present disclosure, after the obtaining of the sample comment information, the method further includes: acquiring the length of the sample comment information; and if the length of the sample comment information is greater than a preset length, deleting the part of the sample comment information which is greater than the preset length.

In one embodiment of the present disclosure, the loss function of the emotion recognition model is: l ═ y logy '+ (1-y) log (1-y') ]; wherein, the label is the prediction label.

In one embodiment of the present disclosure, the generating an encoding vector of the plurality of words according to the index value, the mask value, and the text number corresponding to the plurality of word pairs includes: and carrying out weighted addition on the index value, the mask value and the text number corresponding to the plurality of words to generate the encoding vector of the plurality of words.

In one embodiment of the present disclosure, wherein the sample comment information includes N words, and the generating of the sentence vector of the sample comment information from the word vector of the plurality of words includes: respectively encoding the N words to generate initial codes corresponding to the N words; and respectively performing transform coding on the initial codes of the N words to form sentence vectors corresponding to the N words respectively, wherein the sentence vector corresponding to the first word in the N words is a sentence phasor of the sample comment information, and when the ith word is subjected to transform coding, the transform codes corresponding to other words in the N words are used as reference codes.

In an embodiment of the present disclosure, the generating a prediction tag of the sample comment information according to the sentence vector includes: performing multi-layer convolution on the sentence vectors to generate convolution values; classifying the convolution values to generate the prediction tag.

According to a second aspect of the embodiments of the present disclosure, there is provided an emotion recognition model training apparatus, including:

a first obtaining unit configured to obtain sample comment information, wherein the sample comment information has an annotation tag;

a word segmentation unit configured to segment the sample comment information to generate a plurality of words;

the first generation unit is configured to respectively acquire an index value, a mask value and a text number of the plurality of words, and generate encoding vectors of the plurality of words according to the index value, the mask value and the text number corresponding to the plurality of word pairs;

a second generating unit configured to generate word vectors of the plurality of words from the encoded vectors of the plurality of words;

a third generating unit configured to generate a sentence vector of the sample comment information from the word vector of the plurality of words;

a fourth generating unit configured to generate a prediction tag of the sample comment information from the sentence vector; and

and the training unit is configured to train an emotion recognition model according to the labeling label and the prediction label.

In an embodiment of the present disclosure, the apparatus for training an emotion recognition model further includes: a second obtaining unit configured to obtain a length of the sample comment information; and a deleting unit configured to delete a portion larger than a preset length among the sample comment information if the length of the sample comment information is larger than the preset length.

In one embodiment of the present disclosure, the loss function of the emotion recognition model is:

L＝-[y logy'+(1-y)log(1-y')]；

wherein y is the label tag and y' is the prediction tag.

In an embodiment of the disclosure, the first generating unit is specifically configured to:

and carrying out weighted addition on the index value, the mask value and the text number corresponding to the plurality of words to generate the encoding vector of the plurality of words.

In an embodiment of the disclosure, the sample comment information includes N words, and the third generating unit is specifically configured to: respectively encoding the N words to generate initial codes corresponding to the N words; and respectively performing transform coding on the initial codes of the N words to form sentence vectors corresponding to the N words respectively, wherein the sentence vector corresponding to the first word in the N words is a sentence phasor of the sample comment information, and when the ith word is subjected to transform coding, the transform codes corresponding to other words in the N words are used as reference codes.

In an embodiment of the disclosure, the fourth generating unit is specifically configured to: performing multi-layer convolution on the sentence vectors to generate convolution values; classifying the convolution values to generate the prediction tag.

According to a third aspect of the embodiments of the present disclosure, there is provided a server, including:

a processor;

a memory for storing the processor-executable instructions;

the processor is configured with the instructions to implement the method for training the emotion recognition model described in the embodiment of the first aspect.

According to a fourth aspect of the embodiments of the present disclosure, there is provided a storage medium, wherein instructions of the storage medium, when executed by a processor of a server, enable the server to execute the method for training an emotion recognition model described in the first aspect.

According to a fifth aspect of the embodiments of the present disclosure, there is provided a computer program product, wherein instructions of the computer program product, when executed by a processor, enable a server to execute the method for training an emotion recognition model as described in the first aspect.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

obtaining sample comment information, wherein the sample comment information is provided with a labeling label; segmenting the sample comment information to generate a plurality of words; respectively acquiring index values, mask values and text numbers of a plurality of words, and generating coding vectors of the plurality of words according to the index values, mask values and text numbers corresponding to the plurality of word pairs; generating word vectors of a plurality of characters according to the coding vectors of the plurality of characters, and generating sentence vectors of the sample comment information according to the word vectors of the plurality of characters; generating a prediction label of the sample comment information according to the sentence vector; and training the emotion recognition model according to the labeling label and the prediction label. Therefore, the emotion recognition model can be trained, so that the comment information is processed through the emotion recognition model, manual review and processing of the comment information are not needed, and the comment information processing efficiency and accuracy are improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

FIG. 1 is a diagram illustrating an application scenario of an emotion recognition model in accordance with an exemplary embodiment;

FIG. 2 is a diagram of a review interface of a terminal device shown in accordance with an exemplary embodiment;

FIG. 3 is a flow diagram illustrating a method of training an emotion recognition model in accordance with an exemplary embodiment;

FIG. 4 is a flow diagram illustrating another method of training an emotion recognition model in accordance with an exemplary embodiment;

FIG. 5 is a flowchart illustrating a method of training an emotion recognition model in accordance with an exemplary embodiment;

FIG. 6 is a diagram of a review interface for a terminal device shown in accordance with an exemplary embodiment;

FIG. 7 is a diagram of a review interface for a terminal device shown in accordance with an exemplary embodiment;

FIG. 8 is a block diagram illustrating an apparatus for training an emotion recognition model in accordance with an exemplary embodiment;

fig. 9 is a block diagram illustrating a server 200 according to an example embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein.

The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

In an actual application scene, a user can comment on video advertisements, commodities, articles and the like, the emotion of the user on comment objects such as the advertisements, the commodities, the articles and the like is expressed, and the comment objects are usually favored by the user and are appreciated in the comments; and if the user feels the comment object very disliked, the comment object can be expressed through the comment, so that the comment information is analyzed and processed, and corresponding adjustment based on user preference is facilitated to further meet the requirements of related users.

The method comprises the steps of obtaining sample comment information with a label; segmenting the sample comment information to generate a plurality of words; respectively acquiring index values, mask values and text numbers of a plurality of words, and generating coding vectors of the plurality of words according to the index values, mask values and text numbers corresponding to the plurality of word pairs; generating word vectors of a plurality of characters according to the coding vectors of the plurality of characters, and generating sentence vectors of the sample comment information according to the word vectors of the plurality of characters; generating a prediction label of the sample comment information according to the sentence vector; and training the emotion recognition model according to the label tag and the prediction tag so as to process the comment information through the emotion recognition model, so that the comment information does not need to be checked and processed manually, the comment information processing efficiency and accuracy are improved, the comment information can be further deleted, set and the like according to needs, forward propaganda of comment objects such as advertisements and the like is facilitated, and the use requirements of users are met.

The emotion recognition model trained in the method for training the emotion recognition model of the present disclosure may be applied to the scenario shown in fig. 1, where fig. 1 is an application scenario diagram of an emotion recognition model shown according to an exemplary embodiment, the application scenario may include a plurality of terminal devices 10 and a server 20, and the server 20 may be in communication connection with the plurality of terminal devices 10 through a network. The server 20 can simultaneously receive the comment information transmitted by the plurality of terminal devices 10 and transmit the comment information to the plurality of terminal devices 10 for display.

Taking the short video advertisement as an example, the published short video advertisement is commented on by the terminal device 10, as shown in fig. 2, fig. 2 is a comment interface diagram of a terminal device according to an exemplary embodiment, and terminal device 10 may input comment for a short video advertisement in the comment box shown in fig. 2 during playing of the short video advertisement, without limitation to comment information in text, audio, and image forms, meanwhile, the comment information of the relevant user to the video advertisement is displayed in the comment information display area shown in fig. 2, in the embodiment of the present disclosure, server 20 processes the comment information through the emotion recognition model trained by the present disclosure, the comment information in the comment information display area shown in fig. 2 can be continuously updated, more forward comments are provided for the terminal equipment, forward propaganda of comment objects such as advertisements is facilitated, and the use requirements of users are met.

Specifically, fig. 3 is a flowchart illustrating a method for training an emotion recognition model according to an exemplary embodiment, and as shown in fig. 3, the method includes the following steps:

step S101, obtaining sample comment information, wherein the sample comment information is provided with a labeling label.

In the embodiments of the present disclosure, the sample comment information may be any comment information, such as text comment information, audio information, image information, and the like. In other embodiments of the present disclosure, the comment information may be for text, video, or the like. In one embodiment of the present disclosure, the comment information is comment information for a short video.

In the embodiment of the disclosure, the corresponding sample comment information can be acquired for a specific application scene to train the emotion recognition model, so that the accuracy of emotion recognition model recognition is improved, various sample comment information can be acquired based on different scenes to train the emotion recognition model, so that the universality of the emotion recognition model is improved, and the emotion recognition model can be selected and set according to actual requirements.

Specifically, taking a video advertisement scene as an example, extracting comment information of a video advertisement as sample comment information, and labeling emotion semantics of the sample comment information according to an emotion classification labeling rule, where for example, a negative emotion labeling label is 1; the other label is 0; for example, the comment information labeled with label 1 is: "advertises too much! "," this platform is just garbage! "," this product is fake, bought separately "; comment information labeled with a label of 0 such as: "although this hair dryer is somewhat ugly, it is well used", "i bring you".

It should be noted that the label may be labeled according to the actual training requirement, or the label of the positive emotion may be labeled as 1; the other label is 0.

In the embodiment of the present disclosure, after a user sends comment information through a mobile phone, a tablet computer, an intelligent wearable device, a computer, and other terminal devices, a server may obtain one or more pieces of comment information and store the one or more pieces of comment information in a database, so that one or more pieces of sample comment information may be obtained from the database, one or more pieces of comment information sent by the terminal device may also be obtained in real time as sample comment information, and a selection setting may be performed according to an actual application scenario, for example, as follows:

in a first example, taking a video advertisement scene as an example, one or more pieces of comment information sent by a terminal device are directly obtained in real time as sample comment information.

In a second example, taking an article scene as an example, the comment information of an article is mainly to avoid some comment information such as false advertising, because one or more sample comment information may be obtained from the database.

It can be understood that a large amount of sample comment information is needed for training, the sample comment information can be quickly retrieved through the sample identification, the training efficiency of the emotion recognition model is further improved, as a possible implementation manner, the comment identification of the sample comment information is obtained, and the sample identification of the sample comment information is generated according to the comment identification, wherein the sample identification is used for retrieving the sample comment information, so that after the target identification is obtained, detection can be performed according to the target identification to obtain the sample comment information corresponding to the target identification.

It can be understood that comment information sent by different terminal devices may be the same or different, and for the same terminal device and the same comment object, comment information sent at different time periods may also be different, for example, a video advertisement scene is taken as an example, and the object of comment information comment may include; the e-commerce product in the advertisement can also be anything in the video, or a comment on a video publishing platform, and the like.

Step S102, segmenting the sample comment information to generate a plurality of words.

Specifically, the comment information generally appears in a text form, and may also include web page links, numbers, letters, and the like. In other embodiments of the present disclosure, if the sample comment information is audio information or video information, the audio information is converted into text information, or the text information is extracted from the video information; similarly, if the sample comment information is image information, the image information is converted into text information, or the text information is extracted from the image information.

In the embodiment of the present disclosure, if the sample comment information is text information, there are many ways of performing word segmentation on the comment information to generate a plurality of segmented words, which are exemplified as follows:

in a first example, a word segmentation method based on character string matching processes sample comment information, matches the sample comment information with entries in a machine dictionary, and identifies a word segmentation if a certain character string is found in the dictionary, thereby generating a plurality of words.

In the second example, the sample comment information is processed based on an understood word segmentation method, the sample comment information is segmented while syntactic and semantic analysis is performed, and part of speech tagging is performed by using the syntactic information and the semantic information, so that a plurality of segmented words are generated.

In a third example, the sample comment information is segmented based on a segmentation model, and the segmentation model can be segmented by using the existing segmentation model.

Step S103, respectively obtaining the index values, mask values and text numbers of a plurality of words, and generating coding vectors of the plurality of words according to the index values, mask values and text numbers corresponding to the plurality of word pairs.

In the embodiment of the disclosure, after the sample comment information is cut into words to generate a plurality of words, the index values, mask values and text numbers of the plurality of words are respectively obtained, wherein the index values can be understood as indexes of a Chinese dictionary after the sample comment information is segmented according to word granularity; the mask value may be understood as a position where the sample comment information is masked when training, the text number may be understood as 0 if only one sample comment information is input, and as a pair of sample comment information is input, the two sample comment information are separated by a character such as [ SEP ], and the text numbers of the two sample comment information are different, such as the text number of the first sample comment information a is 0 and the text number of the second comment information is 1.

It should be noted that the purpose of the mask value is to better train the emotion recognition model, and after training, the emotion recognition model learns not only the position of the mask but also what the masked word is, that is, the emotion recognition model has learned the semantics of the whole comment information more thoroughly, and knows which word appears at which position in the comment information.

Further, the coding vector of each word is generated according to the index value, the mask value and the text number corresponding to each word, and as a possible implementation mode, the index value, the mask value and the text number corresponding to a plurality of words are subjected to weighted addition to generate the coding vectors of the plurality of words, so that the coding vector of each word is rapidly acquired, and the subsequent training efficiency is improved.

For example, the sample comment information is "i love in beijing tiananmen", the word segmentation is performed to generate seven words, i "," love "," north "," beijing "," sky "," ann ", and" gate ", then feature extraction is performed on the seven words to obtain an index value, a mask value, and a text number of the seven words, and weighted addition is performed according to the index value, the mask value, and the text number of the seven words to generate a seven-word encoding vector.

Continuing to take the example of the sample comment information "i love beijing tiananmen", as an example, the matrix T1 of the sample comment information corresponding to the plurality of index values is [ 1111111 ], the matrix T2 of the mask value is [ 2222222 ], and the matrix T3 of the text number is [ 0000000 ]; firstly, weighted addition processing is carried out on T1, T2 and T3 to obtain a matrix of the code vector [ 3333333 ].

And step S104, generating word vectors of the multiple words according to the coding vectors of the multiple words, and generating sentence vectors of the sample comment information according to the word vectors of the multiple words.

Step S105, generating a prediction label of the sample comment information according to the sentence vector.

In the embodiment of the present disclosure, the emotion recognition model may be generated by training based on the sample comment information using a recurrent Neural network, a Convolutional Neural network, and the like, and specifically, in the embodiment of the present disclosure, the emotion recognition model includes a BERT (Bidirectional encoded representation from Transformer based on machine translation) module and a CNN (Convolutional Neural network) module.

The BERT module is a pre-training model for deep learning, so that syntactic features, word features and context semantic features of comment information can be extracted better, sentence vectors of the comment information, such as 'apple', are constructed by the aid of the syntactic features, the word features and the context semantic features, the word represents a mobile phone and a fruit, the sentence vectors of the whole text can be obtained through the BERT module, and accordingly the 'apple' is finally determined to be the mobile phone or the fruit.

It should be noted that a sentence vector can be understood as a vector representation in which full-text semantic information is fused with each word in a text, that is, the BERT module enhances semantic vector representation of each word in the text, so as to improve accuracy of subsequent emotion classification.

In the embodiment of the present disclosure, after obtaining the encoded vectors of the multiple words, the word vectors of the multiple words are generated according to the encoded vectors of the multiple words, where the word vectors include syntactic features, semantic features, and the like.

For example, the sample comment information is "i love apple to play a game", and after the above processing, the sample comment information has seven encoding vectors, each encoding vector has a corresponding word vector, such as "apple" corresponding to encoding vector a1, "fruit" corresponding to encoding vector a2, "apple" corresponding to word vector a1+ a2, and is closer to the word vectors of "cell phone", "hua shi", and the like in semantic space, but is farther from the word vectors of fruits such as "banana" in semantic space.

Furthermore, there are many ways of generating a sentence vector of sample comment information according to a word vector of a plurality of words, as a possible implementation way, the sample comment information includes N words, and the N words are respectively encoded to generate initial codes corresponding to the N words; the initial codes of the N words are respectively subjected to conversion coding to form sentence vectors corresponding to the N words, wherein the sentence vector corresponding to the first word in the N words is a sentence phasor of the sample comment information, and when the ith word is subjected to conversion coding, conversion codes corresponding to other words in the N words are used as reference codes, so that vector representation of each word in the sample comment information after full-text semantic information is fused can be realized, and the accuracy of subsequently obtaining the prediction tag is improved.

Finally, the prediction labels of the sample comment information are generated according to the sentence vectors in various ways, and as a possible implementation way, the sentence vectors are subjected to multilayer convolution to generate convolution values, and the convolution values are classified to generate the prediction labels, so that the sentence vectors are processed through a convolutional neural network, and the accuracy and the efficiency of emotion recognition model training are further improved.

The CNN module in the emotion recognition model comprises a convolution layer and a pooling layer, the convolution layer in the CNN module can adopt 1-dimensional convolution for example, the pooling layer adopts a maximum pooling layer for example to realize sentence vector mining and dimension reduction processes, sentence vectors are input into the CNN module and are processed by the convolution layer and the pooling layer to obtain text deep semantic features, namely convolution values are generated, and the convolution values are classified by a classifier to obtain prediction labels.

It should be noted that the sentence vectors may also be processed by other multi-layer neural network layers, such as the values obtained by the recurrent network layer are further classified to generate the prediction tags.

And step S106, training the emotion recognition model according to the labeling label and the prediction label.

In the embodiment of the disclosure, the parameters of the emotion recognition model are continuously adjusted according to the errors of the label and the prediction label, so as to improve the recognition accuracy of the emotion recognition model, and more specifically, the offline training can be performed in a loss function manner, where the loss function is the cross entropy loss of the prediction label output by the emotion recognition model and the label actually labeled. In an exemplary embodiment of the disclosure, the loss function may be: l- [ acyl '+ (1-y) log (1-y') ]; wherein y is the label tag and y' is the prediction tag.

In the embodiment of the disclosure, a random gradient descent algorithm may be used to update the gradient of the emotion recognition model, specifically, a group is randomly extracted from the sample comment information, and after training, the group is updated according to the gradient, and then the group is extracted again and updated again until the loss function obtains an emotion recognition model with a loss value L within an acceptable range.

According to the training method of the emotion recognition model, sample comment information is obtained, wherein the sample comment information is provided with a label; segmenting the sample comment information to generate a plurality of words; respectively acquiring index values, mask values and text numbers of a plurality of words, and generating coding vectors of the plurality of words according to the index values, mask values and text numbers corresponding to the plurality of word pairs; generating word vectors of a plurality of characters according to the coding vectors of the plurality of characters, and generating sentence vectors of the sample comment information according to the word vectors of the plurality of characters; generating a prediction label of the sample comment information according to the sentence vector; and training the emotion recognition model according to the labeling label and the prediction label. Therefore, the emotion recognition model can be trained, so that the comment information is processed through the emotion recognition model, manual review and processing of the comment information are not needed, and the comment information processing efficiency and accuracy are improved.

FIG. 4 is a flowchart illustrating another method for training an emotion recognition model, according to an example embodiment, as shown in FIG. 4, including the steps of:

step S201, obtaining a comment identifier of the sample comment information, and generating a sample identifier of the sample comment information according to the comment identifier, wherein the sample identifier is used for retrieving the sample comment information.

Step S202, obtaining a target identification, and detecting according to the target identification to obtain sample comment information corresponding to the target identification.

It should be noted that, labels can be labeled according to actual training requirements, and labels for positive emotions can also be labeled as 1; the other label is 0.

In the embodiment of the disclosure, the comment identification of the sample comment information is obtained, the sample identification of the sample comment information is generated according to the comment identification, the sample identification is the only identification for distinguishing the comment information, the sample comment information can be retrieved quickly through the sample identification, and the training efficiency of the emotion recognition model is further improved.

The comment identification can be based on different numbers given to different sample comment information, such as 01, 03 and the like, so that the sample identification for generating the sample comment information according to the comment identification can be A01, A02, A03 and the like, the sample comment information can be uniquely identified, and the comment identification can be selected and set according to actual application needs.

Further, a target identifier such as a01 is obtained, and detection matching is performed according to the target identifier a01 until the sample identifier is a01, so that sample comment information corresponding to a01 is obtained.

Step S203, obtaining the length of the sample comment information; and if the length of the sample comment information is greater than the preset length, deleting the part of the sample comment information which is greater than the preset length.

In the embodiment of the disclosure, in order to train an emotion recognition model better, sample comment information may be preprocessed, and word segmentation is performed according to word granularity during word segmentation, so that the length of the comment information is the number of words of a comment, and generally, the longer the sentence length of the comment information is, the longer the model training time is, so that a statistic is performed on the sentence length of the overall comment information, and under the condition of limited resources, the performance of training and prediction is optimized, for example, three quarters of the statistic distribution of the number of words of the comment information is used as the number of comments, the three quarters are 128 characters, sentences exceeding 128 are truncated, sentences below 128 characters can be complemented, and the processing efficiency is further improved.

Step S204, the sample comment information is cut into words to generate a plurality of words, index values, mask values and text numbers of the plurality of words are respectively obtained, weighted addition is carried out on the index values, the mask values and the text numbers corresponding to the plurality of words to generate encoding vectors of the plurality of words, and word vectors of the plurality of words are generated according to the encoding vectors of the plurality of words.

In the embodiment of the disclosure, the emotion recognition model may include a BERT module, an input of the BERT module includes an index value, a mask value, and a text number of a plurality of words, the index value, the mask value, and the text number of the plurality of words may be obtained by segmenting the sample comment information to generate the plurality of words, and the index value, the mask value, and the text number corresponding to the plurality of words are weighted and added to generate an encoding vector of the plurality of words, and a word vector of the plurality of words is generated according to the encoding vector of the plurality of words.

The index value can be understood as an index of the comment information in a Chinese character dictionary after word segmentation is carried out on the comment information according to the word granularity; the mask value may be understood as a position where the comment information is to be masked after being divided when training, and the text number may be understood as 0 if the BERT module in the emotion recognition model inputs only one comment information, or as 0 if the BERT module inputs a pair of comment information, and the text number of the first comment information a is 0 and the text number of the second comment information is 1.

It should be noted that the mask value is for better training of the BERT module, and the trained model learns not only the location of the mask, but also what the masked word is, i.e., the BERT module has learned the semantics of the entire comment message more thoroughly, knowing which word appears at which location in the comment message.

It should be noted that one of the tasks of the BERT module in the emotion recognition model to perform pre-training is to predict whether the text message B is in the following text of the text message a, such as the text message a: "it is rainy today", text information B: "slippery on the road", the BERT module needs to predict whether the text message a is followed by the text message B, so that the text number vector of the text message a is represented by 0 and the text number vector of the text message B is represented by 1. In the embodiment of the present disclosure, since a single text message is input in the task of emotion recognition, the text numbers are all 0.

For example, taking the real video advertisement comment information as an example, the sample comment information is that "if there are too many advertisements, the recommendation is not given all the time", and after performing word segmentation processing, the following steps are performed: [ CLS ] | advertisement | too | much |, | all | let | recommend | also | straight | recommend | SEP ], where the beginning of the comment information is denoted by the "[ CLS ]" mark, and the end of the text is denoted by the [ SEP ].

The index value is: [101240814401922191474980246963679637529725773682067146842972577310200000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000].

The mask value is: [ 11111111111111111100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 ]; where the label 1 is a mask position where the word at this position, labeled 1, is replaced with a mask with 80% probability, with 10% probability of random replacement with another word and with 10% probability of replacement with the word itself.

The text number is: [00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000].

Furthermore, the index values, the mask values and the text numbers corresponding to the multiple words are subjected to weighted addition to generate coding vectors of the multiple words, that is, the matrix vectors are subjected to addition processing to obtain the coding vectors of the multiple words, so that the coding vectors of each word are rapidly obtained, and the subsequent training efficiency is improved.

Step S205, the sample comment information comprises N words, and the N words are respectively encoded to generate initial codes corresponding to the N words; and respectively carrying out conversion coding on the initial codes of the N words to form sentence vectors corresponding to the N words respectively, wherein the sentence vector corresponding to the first word in the N words is a sentence phasor of the sample comment information, and when the ith word is subjected to conversion coding, the conversion codes corresponding to other words in the N words are used as reference codes.

In the embodiment of the disclosure, the sample comment information includes N words, wherein i and N are both positive integers, in the process of generating a sentence vector of the sample comment information according to a word vector of a plurality of words, an initial code corresponding to each word, that is, an initial semantic feature learning result, may be obtained by encoding for each word in a manner of an encoder or the like, and performs transform coding on the initial code of each word to form a sentence vector corresponding to each word, that is, learning is performed with reference to transform codes corresponding to other words according to the initial semantic feature learning result of each word, and finally, a sentence vector corresponding to the sample comment information may be obtained, that is, a vector representation after each word in the sample comment information is fused with full-text semantic information is obtained, and accuracy of subsequently obtaining a prediction tag is improved.

In step S206, the sentence vectors are subjected to multilayer convolution to generate convolution values, and the convolution values are classified to generate prediction labels.

In the embodiment of the disclosure, the emotion recognition model includes a convolutional neural network module, the convolutional neural network module includes a convolutional layer and a pooling layer, for example, 1-dimensional convolution can be adopted for the convolutional layer in the convolutional neural network module, for example, a maximum pooling layer is adopted for the pooling layer, sentence vector mining and dimensionality reduction processes are realized, the sentence vectors are input into the convolutional neural network module and processed by the convolutional layer and the pooling layer, deep semantic features of a text can be obtained, that is, convolution values are generated, the convolution values are classified by a classifier to obtain prediction labels, and the sentence vectors are processed by the convolutional neural network, so that the accuracy and the efficiency of emotion recognition model training are further improved.

And step S207, training the emotion recognition model according to the labeling label and the prediction label.

In an exemplary embodiment of the present disclosure, parameters of the emotion recognition model are continuously adjusted according to errors of the label and the prediction label, so as to improve recognition accuracy of the emotion recognition model, and more specifically, offline training may be performed in a loss function manner, where the loss function is a cross entropy loss between the prediction label output by the emotion recognition model and the label of the true label, and in an exemplary embodiment of the present disclosure, the loss function may be: l- [ acyl '+ (1-y) log (1-y') ]; wherein y is the label tag and y' is the prediction tag.

In order to make the above process more clear to those skilled in the art, the following description will be made in detail by taking a specific example in conjunction with fig. 5.

Specifically, as shown in fig. 5, the sample comment information X is "advertisement too much, not having a recommendation, and having a recommendation all the time", after the comment information is cut into words, it becomes "advertisement", "too", "not", "let", "push", "recommend", "still", "one", "straight", "push", and "recommend" a plurality of words X01, index values X02, mask values X03, and text numbers X04 of a plurality of words are extracted according to the plurality of words, then a plurality of words of coded vectors are generated according to the index values, mask values, and text numbers of the plurality of words and input into the BERT module to generate a sentence vector corresponding to the comment sample information, that is, a plurality of words of word vectors are generated according to the plurality of words of coded vectors, and a sentence vector of the sample comment information is generated according to the plurality of words of word vectors, the sentence vector includes semantic information of the entire sample comment information, including context characteristics, therefore, the sentence vectors are input into the CNN module to realize multilayer convolution on the sentence vectors so as to generate convolution values, the convolution values are classified through a layer of classifier, and finally labels corresponding to the comment information are output.

The method for training the emotion recognition model comprises the steps of obtaining a comment identifier of sample comment information, generating the sample identifier of the sample comment information according to the comment identifier, wherein the sample identifier is used for retrieving the sample comment information and obtaining a target identifier, detecting according to the target identifier to obtain the sample comment information corresponding to the target identifier, obtaining the comment identifier of the sample comment information, and obtaining the length of the sample comment information; if the length of the sample comment information is larger than the preset length, deleting the part of the sample comment information larger than the preset length, carrying out word segmentation on the sample comment information to generate a plurality of words, respectively obtaining index values, mask values and text numbers of the plurality of words, carrying out weighted addition on the index values, mask values and text numbers corresponding to the plurality of words to generate encoding vectors of the plurality of words, generating word vectors of the plurality of words according to the encoding vectors of the plurality of words, wherein the sample comment information comprises N words, and respectively encoding the N words to generate initial codes corresponding to the N words; the method comprises the steps of respectively carrying out conversion coding on initial codes of N words to form sentence vectors corresponding to the N words, wherein the sentence vector corresponding to the first word in the N words is a sentence phasor of sample comment information, when the ith word is subjected to conversion coding, taking conversion codes corresponding to other words in the N words as reference codes, carrying out multilayer convolution on the sentence vectors to generate convolution values, classifying the convolution values to generate prediction labels, and training an emotion recognition model according to the labeling labels and the prediction labels. Therefore, the emotion recognition model can be trained in the above mode, so that the comment information is processed through the emotion recognition model, and the review efficiency of the comment information is improved.

Based on the description of the above embodiment, the comment information may be processed by the trained emotion recognition model, and specifically, in the embodiment of the present disclosure, after obtaining the comment information and performing word segmentation to obtain a plurality of word segments, the plurality of word segments are input into the emotion recognition model and processed to generate a score value of the comment information, that is, the plurality of word segments are input into the emotion knowledge model, and the score value of the comment information may be output.

The emotion recognition model is generated based on the above embodiment description through pre-training.

In the embodiment of the disclosure, the emotion corresponding to the comment information may be positive or negative, and the score value may be set according to application requirements, and is used to represent the degrees of the positive emotion and the negative emotion corresponding to the comment information, such as a preset threshold, where a value greater than the preset threshold represents a positive comment, and a larger value represents a larger positive emotion; less than the threshold is indicative of a negative comment, and smaller indicates more negative emotion.

In the embodiment of the disclosure, corresponding score value intervals can also be directly set for the positive emotions and the negative emotions, for example, the score value interval [0-1] represents the negative emotions, and the higher the score value in the score value interval [0-1], the more the negative emotions are represented.

Further, the comment information is processed according to the score value of the comment information, and how to process the comment information according to the score value of the comment information may be selected according to the application scene needs, which is exemplified as follows:

in a first example, if the score value of the comment information is lower than a preset threshold, the comment information is deleted or the authority of the comment information is set.

For example, the trained emotion recognition model is deployed to an online service, for example, when a user reviews under an advertisement, the review information is that "too many spam advertisements and what fake trails are viewed as troublesome" and the "too many spam advertisements and what fake trails are viewed as troublesome" are participated to obtain a plurality of participles, such as "garbage", "game", "broad", "advert", "too", "much", "sh", "how", "fake", "pass", "odd", "like", "class", "like", "seeing", "looking", "just", "and" vex ", the plurality of participles are converted into a plurality of participle vectors, a plurality of input vectors are generated according to the plurality of participle vectors and input into the emotion recognition model for processing to obtain a score value of 0.9871095, wherein a score value interval [0-1] represents negative emotion, higher score values indicate more negative emotion.

Similarly, comment information "fighters in spam" receives a score value of 0.9559605, comment information "which is the coupon in the advertisement? The obtained score value is 0.030765, so that a preset threshold value is set, negative emotion scores of comment information can be divided into negative comments and normal comments, that is, negative comments exceeding the preset threshold value are positive comments which are lower than the preset threshold value, and the negative comments can be deleted and the like, so that the workload of manual review is greatly reduced, and the comment information processing efficiency and accuracy are improved.

Taking a short video advertisement as an example, commenting on a published short video advertisement, as shown in fig. 6, fig. 6 is a comment interface diagram of a terminal device shown in an exemplary embodiment, a score value of comment information is obtained through an emotion recognition model, for example, a score value corresponding to comment information 1 to comment information 5 displayed in a comment information display area in fig. 6A is obtained, and based on judgment of a preset threshold, it is determined that, for example, the comment information 1 is a negative comment, the comment information 1 is deleted, or the comment information 1 is set to be visible to a reviewer, so that comment information displayed in other user comment information display areas is shown in fig. 6B, and the comment information 1 is not displayed any more. Therefore, the comment information can be quickly checked and deleted, more forward comments can be provided for the terminal equipment, forward propaganda of comment objects such as advertisements is facilitated, and the use requirements of users are met.

In a second example, the ranking position of the comment information is adjusted according to the score value of the comment information.

For example, the emotion recognition model can perform emotion recognition on all comment information under an advertisement to obtain a corresponding score value, the more the score value is, the more the negative emotion is, the smaller the score value is, the better the positive emotion is, and the advertisement publisher can perform corresponding operation according to the score value, for example, the comment of the positive emotion is set on top under the advertisement comment to attract more positive evaluation on the advertisement, so as to reduce the influence of the negative comment on the advertisement propagation.

Taking a short video advertisement as an example, commenting on a published short video advertisement, as shown in fig. 7, fig. 7 is a comment interface diagram of a terminal device shown in an exemplary embodiment, a score value of comment information is obtained through an emotion recognition model, for example, score values corresponding to comment information 1 to comment information 5 displayed in a comment information display area shown in fig. 7A are obtained and sorted according to the score values corresponding to the comment information, the comment information 1 is not sorted to the top five, and comment information displayed in other user comment information display areas is shown in fig. 7B, and the comment information 1 is not displayed any more, so that comment information with a lower score value is not displayed any more. Therefore, the comments with positive emotions are displayed in the comment information display area in a top mode, more positive comments are provided for the terminal equipment, positive propaganda of comment objects such as advertisements is facilitated, and the use requirements of users are met.

Therefore, the score value of the comment information is obtained through the emotion recognition model, and the comment information is processed based on the score value of the comment information, so that manual review and processing of the comment information are not needed, the comment information processing efficiency and accuracy are improved, the comment information can be further deleted and set as required, forward propaganda of comment objects such as advertisements is facilitated, and the use requirements of users are met.

FIG. 8 is a block diagram of an apparatus for training emotion recognition models, according to an example embodiment. Referring to fig. 7, the emotion recognition model training apparatus 30 includes; a first acquisition unit 301, a word segmentation unit 302, a first generation unit 303, a second generation unit 304, a third generation unit 305, a fourth generation unit 306 and a training unit 307.

A first obtaining unit 301 configured to obtain sample comment information, where the sample comment information has an annotation tag.

A word segmentation unit 302 configured to segment the sample comment information to generate a plurality of words.

The first generating unit 303 is configured to obtain an index value, a mask value, and a text number of a plurality of words, respectively, and generate an encoding vector of the plurality of words according to the index value, the mask value, and the text number corresponding to the plurality of word pairs.

Further, as a possible implementation manner, the first generating unit 303 is specifically configured to perform weighted addition on the index value, the mask value and the text number corresponding to each word to generate an encoded vector of a plurality of words.

A second generating unit 304 configured to generate word vectors of a plurality of words from the encoded vectors of the plurality of words.

A third generating unit 305 configured to generate a sentence vector of the sample comment information from the word vector of the plurality of words.

A fourth generating unit 306 configured to generate a prediction tag of the sample comment information from the sentence vector.

Further, there are many ways to generate a sentence vector of sample comment information according to a word vector of a plurality of words, and as a possible implementation manner, the third generating unit 305 is specifically configured to encode the N words to generate initial codes corresponding to the N words, where the sample comment information includes the N words; and respectively carrying out conversion coding on the initial codes of the N words to form sentence vectors corresponding to the N words respectively, wherein the sentence vector corresponding to the first word in the N words is a sentence phasor of the sample comment information, and when the ith word is subjected to conversion coding, the conversion codes corresponding to other words in the N words are used as reference codes.

Finally, there are many ways to generate the prediction tag of the sample comment information according to the sentence vector, and as a possible implementation manner, the fourth generating unit 306 is specifically configured to perform multilayer convolution on the sentence vector to generate a convolution value, and classify the convolution value to generate the prediction tag.

And a training unit 306 configured to train the emotion recognition model according to the labeling labels and the prediction labels.

In the embodiment of the disclosure, in order to train the BERT module better, sample comment information may be preprocessed, and the BERT module performs word segmentation according to word granularity during word segmentation, so that the length of the comment information is the number of words of comments, and generally, the longer the sentence length of the comment information is, the longer the model training time is, so that a statistic is performed on the sentence length of the overall comment information, and under the condition of limited resources, the training and prediction performance is optimized, for example, three quarters of the statistic distribution of the word number of the comment information is used as the maximum length of the comments, three quarters of the length is 128 characters, sentences exceeding 128 are truncated, sentences below 128 characters may be complemented, and the processing efficiency is further improved.

According to the training device of the emotion recognition model, sample comment information is obtained, wherein the sample comment information is provided with a label; segmenting the sample comment information to generate a plurality of words; respectively acquiring index values, mask values and text numbers of a plurality of words, and generating coding vectors of the plurality of words according to the index values, mask values and text numbers corresponding to the plurality of word pairs; generating word vectors of a plurality of characters according to the coding vectors of the plurality of characters, and generating sentence vectors of the sample comment information according to the word vectors of the plurality of characters; generating a prediction label of the sample comment information according to the sentence vector; and training the emotion recognition model according to the labeling label and the prediction label. Therefore, the emotion recognition model can be trained, so that the comment information is processed through the emotion recognition model, manual review and processing of the comment information are not needed, and the comment information processing efficiency and accuracy are improved.

As shown in fig. 9, the server 200 includes:

a memory 210 and a processor 220, a bus 230 connecting different components (including the memory 210 and the processor 220), wherein the memory 210 stores a computer program, and when the processor 220 executes the program, the method for training the emotion recognition model according to the embodiment of the present disclosure is implemented.

Bus 230 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Server 200 typically includes a variety of electronic device readable media. Such media may be any available media that is accessible by server 200 and includes both volatile and nonvolatile media, removable and non-removable media.

Memory 210 may also include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)240 and/or cache memory 250. The server 200 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 260 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 9, and commonly referred to as a "hard drive"). Although not shown in FIG. 6, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 230 by one or more data media interfaces. Memory 210 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the disclosure.

A program/utility 280 having a set (at least one) of program modules 270, including but not limited to an operating system, one or more application programs, other program modules, and program data, each of which or some combination thereof may comprise an implementation of a network environment, may be stored in, for example, the memory 210. The program modules 270 generally perform the functions and/or methodologies of the embodiments described in this disclosure.

The server 200 may also communicate with one or more external devices 290 (e.g., keyboard, pointing device, display 291, etc.), with one or more devices that enable a user to interact with the server 200, and/or with any devices (e.g., network card, modem, etc.) that enable the server 200 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 292. Also, server 200 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet) via network adapter 293. As shown, network adapter 293 communicates with the other modules of server 200 via bus 230. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the server 200, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

The processor 220 executes various functional applications and data processing by executing programs stored in the memory 210.

It should be noted that, for the implementation process and the technical principle of the server in this embodiment, reference is made to the foregoing explanation of the method for training the emotion recognition model in the embodiment of the present disclosure, and details are not described here again.

The server provided by the embodiment of the disclosure can execute the method for training the emotion recognition model as described above, and sample comment information is obtained, wherein the sample comment information has a label; segmenting the sample comment information to generate a plurality of words; respectively acquiring index values, mask values and text numbers of a plurality of words, and generating coding vectors of the plurality of words according to the index values, mask values and text numbers corresponding to the plurality of word pairs; generating word vectors of a plurality of characters according to the coding vectors of the plurality of characters, and generating sentence vectors of the sample comment information according to the word vectors of the plurality of characters; generating a prediction label of the sample comment information according to the sentence vector; and training the emotion recognition model according to the labeling label and the prediction label. Therefore, the emotion recognition model can be trained, so that the comment information is processed through the emotion recognition model, manual review and processing of the comment information are not needed, and the comment information processing efficiency and accuracy are improved.

In order to implement the above embodiments, the present disclosure also provides a storage medium.

Wherein the instructions in the storage medium, when executed by the processor of the server, enable the server to perform the method for training an emotion recognition model as described above.

To implement the above embodiments, the present disclosure also provides a computer program product, which when executed by a processor of a server, enables the server to execute the method for training an emotion recognition model as described above.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method for training an emotion recognition model is characterized by comprising the following steps:

2. The method for training emotion recognition model of claim 1, further comprising, after said obtaining sample comment information:

acquiring the length of the sample comment information; and

and if the length of the sample comment information is greater than a preset length, deleting the part of the sample comment information which is greater than the preset length.

3. The method for training an emotion recognition model as recited in claim 1, wherein the loss function of the emotion recognition model is:

L＝-[ylogy'+(1-y)log(1-y')]；

wherein y is the label tag and y' is the prediction tag.

4. The method for training an emotion recognition model as recited in claim 1, wherein the generating of the encoded vectors of the words from the index values, mask values, and text numbers corresponding to the word pairs comprises:

5. The method for training an emotion recognition model according to claim 1, wherein the sample comment information includes N words, and the generating of the sentence vector of the sample comment information from the word vectors of the plurality of words includes:

respectively encoding the N words to generate initial codes corresponding to the N words;

and respectively performing transform coding on the initial codes of the N words to form sentence vectors corresponding to the N words respectively, wherein the sentence vector corresponding to the first word in the N words is a sentence phasor of the sample comment information, and when the ith word is subjected to transform coding, the transform codes corresponding to other words in the N words are used as reference codes.

6. The method for training the emotion recognition model of claim 1, wherein the generating of the prediction labels of the sample comment information from the sentence vector comprises:

performing multi-layer convolution on the sentence vectors to generate convolution values;

classifying the convolution values to generate the prediction tag.

7. An emotion recognition model training apparatus, comprising:

8. The apparatus for training emotion recognition models as claimed in claim 7, further comprising:

a second obtaining unit configured to obtain a length of the sample comment information; and

a deleting unit configured to delete a portion larger than a preset length among the sample comment information if the length of the sample comment information is larger than the preset length.

9. A server, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured as the instructions to implement the method of training an emotion recognition model according to any of claims 1 to 6.

10. A storage medium in which instructions, when executed by a processor of a server, enable the server to perform a method of training an emotion recognition model as claimed in any of claims 1 to 6.