CN116189064B

CN116189064B - Barrage emotion analysis method and system based on joint model

Info

Publication number: CN116189064B
Application number: CN202310458854.3A
Authority: CN
Inventors: 宋彦; 陈伟东; 罗常凡
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2023-04-26
Filing date: 2023-04-26
Publication date: 2023-08-29
Anticipated expiration: 2043-04-26
Also published as: CN116189064A

Abstract

The invention discloses a barrage emotion analysis method and system based on a joint model, wherein barrage comments are input into the trained joint model to output emotion tendencies corresponding to the barrage comments, the joint model comprises a coding module and a decoding module, the coding module comprises a video coding module, a text coding module, a gating fusion module and a multi-mode fusion module, the decoding module comprises a barrage reconstruction module and an emotion analysis module, and the decoding module takes the output of the coding module as input to output emotion analysis tendencies corresponding to the barrage comments; the barrage emotion analysis method and system utilize a gating fusion screening mechanism to take surrounding barrage comments as context information of target barrage comments, and utilize a multi-mode fusion mode to take video information into consideration, and fully utilize useful information to strengthen characteristic representation of video barrages so as to accurately identify emotion tendencies of the target barrage comments.

Description

A joint model-based bullet screen sentiment analysis method and system

技术领域technical field

本发明涉及弹幕情感分析技术领域，尤其涉及一种基于联合模型的弹幕情感分析方法及系统。The invention relates to the technical field of barrage sentiment analysis, in particular to a joint model-based barrage sentiment analysis method and system.

背景技术Background technique

视频弹幕情感分析指在分析出视频实时评论的情感极性。Video barrage sentiment analysis refers to analyzing the emotional polarity of real-time video comments.

现有的视频弹幕情感分析方法倾向于提取句子级的特征进行情感分析和分类，都是基于规则的语法、语义基础之上的，但是弹幕的特点是：较短、口语化严重、用特殊字符代表特定含义、语法极不规范等，所以现有的情感分析方法无法准确的对弹幕进行合适的分词以及语法分析等，进而无法准确的情感分析。Existing video bullet chat sentiment analysis methods tend to extract sentence-level features for sentiment analysis and classification, all of which are based on rule-based grammar and semantics. Special characters represent specific meanings, grammar is extremely irregular, etc., so existing sentiment analysis methods cannot accurately perform proper word segmentation and grammatical analysis on bullet chats, and thus cannot accurately analyze sentiment.

另外，现有弹幕评论短、没有足够的上下文信息、语法极不规范、与当时的视频主题相关、交互性强、实时性较强等特点，使得现有的方法无法在短时间内对其进行有效准确的情感分析。In addition, the existing barrage comments are short, do not have enough context information, have extremely irregular grammar, are related to the topic of the video at that time, are highly interactive, and are highly real-time, making it impossible for existing methods to analyze them in a short period of time. Perform effective and accurate sentiment analysis.

发明内容Contents of the invention

基于背景技术存在的技术问题，本发明提出了一种基于联合模型的弹幕情感分析方法及系统，可以准确识别目标弹幕评论的感情倾向。Based on the technical problems existing in the background technology, the present invention proposes a joint model-based bullet chat sentiment analysis method and system, which can accurately identify the emotional tendency of target bullet chat comments.

本发明提出的一种基于联合模型的弹幕情感分析方法，将弹幕评论输入到已训练完成的联合模型中以输出所述弹幕评论对应的情感倾向；A joint model-based bullet chat sentiment analysis method proposed by the present invention inputs the bullet chat comments into the trained joint model to output the emotional tendency corresponding to the bullet chat comments;

所述联合模型的训练过程如下：The training process of the joint model is as follows:

S1：构建训练样本集，所述训练样本集包括时刻的弹幕评论/>、时刻/>到内弹幕评论/>周围的视频/>和与弹幕评论/>在同一帧视频内的视频周围弹幕评论/>；S1: Construct a training sample set, the training sample set includes time Barrage comments/> , time /> arrive Inner barrage comments/> surrounding video /> And and barrage comments /> Bullet comments around the video within the same frame of video /> ;

S2：对所述视频进行编码并串联，得到编码后的视频特征/>，对所述弹幕评论/>和所述视频周围弹幕评论/>编码，得到编码后的目标弹幕特征/>和周围弹幕特征/>；S2: To the video Encode and concatenate to get encoded video features/> , comments on the barrage /> and bullet chatting comments around the video /> Encoding to get the encoded target barrage feature/> and surrounding barrage features/> ;

S3：基于所述目标弹幕特征对所述周围弹幕特征/>进行筛选过滤后串联，得到筛选后所有的周围弹幕评论/>；S3: Based on the characteristics of the target barrage Features on the surrounding barrage /> Filter and connect in series to get all surrounding barrage comments after filtering /> ;

S4：通过自注意力层和交叉注意力层对视频特征、目标弹幕特征/>、周围弹幕评论/>进行增强处理得到增强视频特征/>、增强目标弹幕特征/>和增强周围弹幕/>；S4: Video features through self-attention layer and cross-attention layer , target barrage features/> , Surrounding barrage comments /> Perform enhancement processing to obtain enhanced video features/> , Enhance the characteristics of the target barrage /> and enhanced surrounding barrage /> ;

S5；基于多层多头注意力层对增强视频特征、增强目标弹幕特征/>、增强周围弹幕/>进行重构得到重构后的弹幕评论，利用交叉熵将重构后的弹幕评论和真实的弹幕评论构建弹幕重构的损失函数/>；S5; Enhance video features based on multi-layer multi-head attention layer pairs , Enhance the characteristics of the target barrage /> , enhance the surrounding barrage /> Perform reconstruction to obtain the reconstructed bullet chat comments, and use cross-entropy to construct a loss function for bullet chat reconstruction by using cross-entropy to construct the reconstructed bullet chat comments and real bullet chat comments. ;

S6：对增强视频特征、增强目标弹幕特征/>、增强周围弹幕/>依次进行正则化和归一化操作，输出所述弹幕评论/>对应预测出的弹幕情感/>；S6: For enhanced video features , Enhance the characteristics of the target barrage /> , enhance the surrounding barrage /> Carry out regularization and normalization operations in sequence, and output the barrage comments/> Corresponding to the predicted barrage emotion/> ;

S7：利用交叉熵将预测出的弹幕情感和真实的弹幕情感/>构建情感预测的损失函数/>，基于弹幕重构的损失函数/>和情感预测的损失函数/>计算得总体损失函数/>，基于总体损失函数和反向传播算法更新联合模型的参数，直至联合模型性能达到设定期望；S7: Using cross-entropy to predict the emotion of the barrage and real barrage emotion/> Constructing a loss function for sentiment prediction /> , the loss function based on barrage reconstruction /> and the loss function for sentiment prediction /> Calculate the overall loss function /> , update the parameters of the joint model based on the overall loss function and the backpropagation algorithm until the performance of the joint model meets the set expectations;

所述周围弹幕评论计算公式如下：The surrounding barrage comments Calculated as follows:

其中，为筛选后第/>个周围弹幕评论，/>为第/>个视频周围弹幕评论/>的周围特征，/>，/>为可学习的门矩阵，/>为可学习的门偏移向量，/>为ReLU函数，表示串联，/>表示乘积。in, After filtering for the first /> surrounding barrage comments, /> for No. /> Barrage comments around the video /> surrounding features, /> , /> is a learnable gate matrix, /> is a learnable gate offset vector, /> is the ReLU function, Indicates concatenation, /> Indicates the product.

进一步地，所述视频特征的计算公式如下：Further, the video feature The calculation formula is as follows:

所述目标弹幕特征的计算公式如下：The characteristics of the target barrage The calculation formula is as follows:

所述周围弹幕特征的计算公式如下：The surrounding barrage features The calculation formula is as follows:

其中，，/>，/>表示串联，/>表示视频编码器，/>表示长短期记忆网络。in, , /> , /> Indicates concatenation, /> Indicates a video encoder, /> Represents a long short-term memory network.

进一步地，在步骤S4：通过自注意力层和交叉注意力层对视频特征、目标弹幕特征/>、周围弹幕评论/>进行增强处理得到增强视频特征/>、增强目标弹幕特征/>和增强周围弹幕/>中，具体包括：Further, in step S4: through the self-attention layer and the cross-attention layer to video features , target barrage features/> , Surrounding barrage comments /> Perform enhancement processing to obtain enhanced video features/> , Enhance the characteristics of the target barrage /> and enhanced surrounding barrage /> , including:

将视频特征、目标弹幕特征/>、周围弹幕评论/>作为自注意力层和交叉注意力层的第一层输入并进行L层迭代，所述L层为自注意力层和交叉注意力层的总层数；feature video , target barrage features/> , Surrounding barrage comments /> As the first layer input of the self-attention layer and the cross-attention layer and perform L layer iterations, the L layer is the total number of layers of the self-attention layer and the cross-attention layer;

在第层输入视频特征/>，得到下一层的输入视频特征/>如下：on the layer input video features /> , to get the input video features of the next layer /> as follows:

在第层输入目标弹幕特征/>，得到下一层的输入目标弹幕特征/>：on the layer input target barrage feature/> , to get the input target barrage features of the next layer/> :

在第层输入周围弹幕评论/>，得到下一层的输入周围弹幕评论/>：on the Barrage comments around layer input /> , to get the next layer's input surrounding barrage comments /> :

其中，SA表示自注意力层，CA表示交叉注意力层。Among them, SA represents the self-attention layer, and CA represents the cross-attention layer.

进一步地，在步骤S5中，所述弹幕重构的损失函数构建公式如下：Further, in step S5, the loss function of the barrage reconstruction The construction formula is as follows:

其中，表示批处理，/>表示交叉熵损失，/>表示重构模块，/>表示重构模块产生的弹幕评论，/>表示时刻/>的真实弹幕评论；in, Indicates batch processing, /> represents the cross-entropy loss, /> Represents a refactoring module, /> Indicates the barrage comments generated by the refactoring module, /> Indicate time/> real barrage comments;

具体地，重构模块产生的弹幕评论具体表示成如下形式：Specifically, the barrage comments generated by the reconstruction module are expressed in the following form:

其中表示多层感知机，LN表示正则化操作，MHA表示交叉多头注意力。in denotes a multi-layer perceptron, LN denotes a regularization operation, and MHA denotes cross-multiple attention.

进一步地，在步骤S6中，预测出的弹幕情感计算公式如下：Further, in step S6, the predicted barrage emotion Calculated as follows:

其中，是Softmax函数，LN表示层正则化操作，/>表示多层感知器，/>为可学习的情感预测矩阵，/>为可学习的视频情感矩阵，/>为可学习的周围弹幕情感矩阵，表示可学习的目标弹幕情感矩阵，/>表示串联操作，·表示乘积。in, Is the Softmax function, LN represents the layer regularization operation, /> represents a multi-layer perceptron, /> is a learnable sentiment prediction matrix, /> is a learnable video sentiment matrix, /> is a learnable surrounding barrage emotion matrix, Represents a learnable target barrage emotion matrix, /> denotes a concatenation operation, and · denotes a product.

进一步地，在步骤S7中，所述情感预测的损失函数构建公式如下：Further, in step S7, the loss function of the emotion prediction The construction formula is as follows:

所述总体损失函数的计算过程如下：The overall loss function The calculation process is as follows:

其中，为预测出的弹幕情感，/>表示交叉熵损失，/>为真实的弹幕情感，/>表示损失平衡参数，/>表示批处理。in, For the predicted barrage emotion, /> represents the cross-entropy loss, /> For the real barrage emotion, /> Indicates the loss balance parameter, /> Indicates batch processing.

一种基于联合模型的弹幕情感分析系统，将弹幕评论输入到已训练完成的联合模型中以输出所述弹幕评论对应的情感倾向；A bullet chatting sentiment analysis system based on a joint model, inputting bullet chatting comments into a trained joint model to output the emotional tendency corresponding to the bullet chatting comments;

分析系统包括构建模块、视频编码模块、文本编码模块、门控融合模块、多模态融合模块、弹幕重构模块、弹幕情感预测模块和损失计算模块；The analysis system includes a building block, a video encoding module, a text encoding module, a gating fusion module, a multimodal fusion module, a bullet chat reconstruction module, a bullet chat emotion prediction module, and a loss calculation module;

所述构建模块用于构建训练样本集，所述训练样本集包括时刻的弹幕评论/>、时刻/>到/>内弹幕评论/>周围的视频/>和与弹幕评论/>在同一帧视频内的视频周围弹幕评论/>；The building block is used to construct a training sample set, and the training sample set includes time Barrage comments/> , time /> to /> Inner barrage comments/> surrounding video /> And and barrage comments /> Bullet comments around the video within the same frame of video /> ;

所述视频编码模块用于对所述视频进行编码并串联，得到编码后的视频特征；The video encoding module is used to encode the video Encode and concatenate to get the encoded video features ;

所述文本编码模块用于对所述弹幕评论和所述视频周围弹幕评论/>编码，得到编码后的目标弹幕特征/>和周围弹幕特征/>；The text encoding module is used to comment on the barrage and bullet chatting comments around the video /> Encoding to get the encoded target barrage feature/> and surrounding barrage features/> ;

所述门控融合模块基于所述目标弹幕特征对所述周围弹幕特征/>进行筛选过滤后串联，得到筛选后所有的周围弹幕评论/>；The gated fusion module is based on the target barrage feature Features on the surrounding barrage /> Filter and connect in series to get all surrounding barrage comments after filtering /> ;

所述多模态融合模块用于通过自注意力层和交叉注意力层对视频特征、目标弹幕特征/>、周围弹幕评论/>进行处理得到增强视频特征/>、增强目标弹幕特征/>和增强周围弹幕/>；The multi-modal fusion module is used for video features through self-attention layer and cross-attention layer , target barrage features/> , Surrounding barrage comments /> Perform processing to obtain enhanced video features/> , Enhance the characteristics of the target barrage /> and enhanced surrounding barrage /> ;

所述弹幕重构模块用于基于多层多头注意力层对增强视频特征、增强目标弹幕特征/>、增强周围弹幕/>进行重构得到重构后的弹幕评论，利用交叉熵将重构后的弹幕评论和真实的弹幕评论构建弹幕重构的损失函数/>；The barrage reconstruction module is used to enhance video features based on multi-layer multi-head attention layers , Enhance the characteristics of the target barrage /> , enhance the surrounding barrage /> Perform reconstruction to obtain the reconstructed bullet chat comments, and use cross-entropy to construct a loss function for bullet chat reconstruction by using cross-entropy to construct the reconstructed bullet chat comments and real bullet chat comments. ;

所述弹幕情感预测模块用于对增强视频特征、增强目标弹幕特征/>、增强周围弹幕/>依次进行正则化和归一化操作，输出所述弹幕评论/>对应预测出的弹幕情感/>；The barrage emotion prediction module is used to enhance video features , Enhance the characteristics of the target barrage /> , enhance the surrounding barrage /> Carry out regularization and normalization operations in sequence, and output the barrage comments/> Corresponding to the predicted barrage emotion/> ;

所述损失计算模块用于利用交叉熵将预测出的弹幕情感和真实的弹幕情感/>构建情感预测的损失函数/>，基于弹幕重构的损失函数/>和情感预测的损失函数计算得总体损失函数/>，基于总体损失函数和反向传播算法更新联合模型的参数，直至联合模型的性能达到设定期望；The loss calculation module is used to use cross entropy to convert the predicted barrage emotion and real barrage emotion/> Constructing a loss function for sentiment prediction /> , the loss function based on barrage reconstruction /> and the loss function for sentiment prediction Calculate the overall loss function /> , update the parameters of the joint model based on the overall loss function and the backpropagation algorithm until the performance of the joint model meets the set expectations;

本发明提供的一种基于联合模型的弹幕情感分析方法及系统的优点在于：本发明结构中提供的一种基于联合模型的弹幕情感分析方法及系统，通过多模态融合模块将视频信息囊括进来，充分考虑视频主题与弹幕的关系，获得增强的特征表示，提升了联合模型对于目标弹幕评论进行情感分析的性能；通过多模态融合模块将视频信息囊括进来，充分考虑视频主题与弹幕的关系，获得增强的特征表示，提升了联合模型对于目标弹幕评论进行情感分析的性能；利用弹幕重构模块，促进各个模块的整体学习效果，提升情感分析模块的性能。The advantages of the joint model-based bullet chat sentiment analysis method and system provided by the present invention are: the joint model-based bullet chat sentiment analysis method and system provided in the structure of the present invention, through the multi-modal fusion module, the video information Include it, fully consider the relationship between the video theme and the bullet chat, obtain enhanced feature representation, and improve the performance of the joint model for sentiment analysis of the target bullet chat comment; include video information through the multi-modal fusion module, fully consider the video theme The relationship with the barrage, obtain enhanced feature representation, improve the performance of the joint model for sentiment analysis of target barrage comments; use the barrage reconstruction module to promote the overall learning effect of each module, and improve the performance of the sentiment analysis module.

附图说明Description of drawings

图1为本发明的结构示意图；Fig. 1 is a structural representation of the present invention;

图2为本发明的模块框架示意图。Fig. 2 is a schematic diagram of the module frame of the present invention.

具体实施方式Detailed ways

下面，通过具体实施例对本发明的技术方案进行详细说明，在下面的描述中阐述了很多具体细节以便于充分理解本发明。但是本发明能够以很多不同于在此描述的其他方式来实施，本领域技术人员可以在不违背本发明内涵的情况下做类似改进，因此本发明不受下面公开的具体实施的限制。In the following, the technical solution of the present invention will be described in detail through specific embodiments, and many specific details are set forth in the following description so as to fully understand the present invention. However, the present invention can be implemented in many other ways different from those described here, and those skilled in the art can make similar improvements without departing from the connotation of the present invention, so the present invention is not limited by the specific implementation disclosed below.

如图1和2所示，本发明提出的一种基于联合模型的弹幕情感分析方法，将弹幕评论输入到已训练完成的联合模型中以输出所述弹幕评论对应的情感倾向；联合模型使用了编码-解码架构，联合模型包括编码模块和解码模块，所述编码模块包含视频编码模块、文本编码模块、门控融合模块以及多模态融合模块，解码模块包括弹幕重构模块和情感分析模块，情感分析模块包括弹幕情感预测模块和损失计算模块，解码模块以编码模块的输出作为输入从而输出弹幕评论对应的情感分析倾向。As shown in Figures 1 and 2, a joint model-based bullet-screen sentiment analysis method proposed by the present invention inputs the bullet-screen comments into the trained joint model to output the corresponding emotional tendency of the bullet-screen comments; the joint The model uses an encoding-decoding architecture. The joint model includes an encoding module and a decoding module. The encoding module includes a video encoding module, a text encoding module, a gated fusion module, and a multimodal fusion module. The decoding module includes a barrage reconstruction module and Sentiment analysis module. The sentiment analysis module includes a bullet chat sentiment prediction module and a loss calculation module. The decoding module takes the output of the encoding module as input to output the sentiment analysis tendency corresponding to the bullet chat comments.

本实施例主要在于联合模型中利用门控筛选机制将周围评论作为目标弹幕的上下文信息，并利用多模态融合方式将视频信息考虑进来，充分利用有用的信息强化视频弹幕的特征表示，联合模型基于残差卷积神经网络、长短期记忆网络、门控融合自注意力层、交叉注意力层等构建，通过如下对联合模型中可学习参数进行训练学习，优化可学习参数以实现通过联合模型准确识别目标弹幕评论的感情倾向的效果，具体如下。This embodiment mainly uses the gating screening mechanism in the joint model to use the surrounding comments as the context information of the target barrage, and uses the multi-modal fusion method to take the video information into account, and fully utilizes useful information to strengthen the feature representation of the video barrage. The joint model is constructed based on residual convolutional neural network, long-term short-term memory network, gated fusion self-attention layer, cross-attention layer, etc. The learnable parameters in the joint model are trained and learned as follows, and the learnable parameters are optimized to achieve pass The joint model accurately identifies the effect of the emotional tendency of the target barrage comments, as follows.

视频有/>帧视频/>，视频周围弹幕评论/>有/>个视频弹幕评论/>，视频周围弹幕评论/>是处于弹幕评论/>周围的评论。video Yes /> frame video /> , Barrage comments around the video /> Yes /> video barrage comments /> , Barrage comments around the video /> Is in the barrage comment /> Comments around.

例如，如图2所示的例子中弹幕评论y “为了自己，坚持！”作为输入，视频周围弹幕评论“画面美”、“身材好”作为y的上下文内容，与发出弹幕评论y时相对应的视频/>一起作为输入。For example, in the example shown in Figure 2, the barrage comment y "For yourself, stick to it!" is used as input, and the barrage comments around the video "Beautiful picture" and "good figure" are used as the context content of y, and the video corresponding to the barrage comment y /> together as input.

在视频编码模块内，使用残差卷积神经网络，编码帧视频/>，并将得到的编码向量串联，得到编码后的帧级别的视频特征/>：Within the video encoding module, using a residual convolutional neural network, encoding frame video /> , and concatenate the obtained encoding vectors to obtain encoded frame-level video features/> :

其中，表示视频编码器，/>表示串联操作；in, Indicates a video encoder, /> Indicates a series operation;

在文本编码模块，使用长短期记忆网络（），分别对弹幕评论/>和其周围的个视频弹幕评论/>编码，得到编码后的目标弹幕特征/>和周围弹幕特征/>：In the text encoding module, the long short-term memory network ( ), comment on the barrage respectively /> and its surrounding video barrage comments /> Encoding to get the encoded target barrage feature/> and surrounding barrage features/> :

即Right now

其中，；in, ;

=/>； =/> ;

应理解的是，第个视频周围弹幕评论/>的周围特征为/>。It should be understood that the Barrage comments around the video /> The surrounding features are /> .

基于视频弹幕的特点，一些周围的具有相同情感的有用的周围弹幕评论可以作为目标弹幕评论的上下文信息提供帮助，因而可以通过门控融合模块，利用来对进行筛选过滤操作，得到门控融合模块处理后的第/>个周围弹幕评论/>：Based on the characteristics of video bullet chatting, some surrounding useful bullet chatting comments with the same emotion can be used as the context information of the target bullet chatting comments, so the gated fusion module can be used to utilize come on Perform screening and filtering operations to obtain the first /> after processing by the gated fusion module surrounding barrage comments /> :

其中，为筛选后第/>个周围弹幕评论，/>为第/>个视频周围弹幕评论/>的周围特征，/>，/>为可学习的门矩阵，/>为可学习的门偏移向量，激活函数/>为ReLU函数，/>表示串联，/>表示乘积，/>和/>均为可学习参数，在联合模型训练过程中进行参数优化，以输的模型达到预期效果；in, After filtering for the first /> surrounding barrage comments, /> for No. /> Barrage comments around the video /> surrounding features, /> , /> is a learnable gate matrix, /> is a learnable gate offset vector, the activation function /> is the ReLU function, /> Indicates concatenation, /> represents the product, /> and /> Both are learnable parameters, and parameters are optimized during the joint model training process to achieve the expected effect with the lost model;

把第1至个周围弹幕评论/>串联起来得到所有的周围弹幕评论/>：Put the first to surrounding barrage comments /> concatenated to get all surrounding barrage comments /> :

其中，表示串联操作。in, Indicates a concatenation operation.

多模态融合模块是由L层自注意力层和交叉注意力层组成，将将视频特征、目标弹幕特征/>、周围弹幕评论/>作为多模态融合模块的第一层的输入，经过多层迭代（即经过L层自注意力层和交叉注意力层处理之后），在最后一层得到相应的融合了别的模态的增强视频特征/>、增强目标弹幕特征/>和增强周围弹幕/>。The multi-modal fusion module is composed of L-layer self-attention layer and cross-attention layer, which will combine video features , target barrage features/> , Surrounding barrage comments /> As the input of the first layer of the multi-modal fusion module, after multi-layer iteration (that is, after the L-layer self-attention layer and cross-attention layer processing), the corresponding enhancement of other modalities is obtained in the last layer. Video Features/> , Enhance the characteristics of the target barrage /> and enhanced surrounding barrage /> .

解码模块由弹幕重构模块和情感分析模块两部分组成，解码模块以编码模块中得到的增强视频特征、增强目标弹幕特征/>、增强周围弹幕/>作为输入；The decoding module consists of two parts: the barrage reconstruction module and the sentiment analysis module. The decoding module uses the enhanced video features obtained in the encoding module , Enhance the characteristics of the target barrage /> , enhance the surrounding barrage /> as input;

在弹幕重构模块，重构损失被该弹幕重构模块分析计算，并被加入到闭环训练中促进多模态融合模块的学习效果，提升情感分析模块的效果。In the barrage reconstruction module, the reconstruction loss is analyzed and calculated by the barrage reconstruction module, and is added to the closed-loop training to promote the learning effect of the multi-modal fusion module and improve the effect of the sentiment analysis module.

弹幕重构模块由多层多头注意力层组成，弹幕重构的的损失函数为：The barrage reconstruction module consists of multi-layer multi-head attention layers, and the loss function of the barrage reconstruction for:

其中，表示批处理，CE表示交叉熵损失，/>表示重构模块，/>表示重构模块产生的弹幕评论，/>表示时刻/>的真实弹幕评论；in, Represents batch processing, CE represents cross-entropy loss, /> Represents a refactoring module, /> Indicates the barrage comments generated by the refactoring module, /> Indicate time/> real barrage comments;

在情感分析模块，包括弹幕情感预测模块和损失计算模块两个部分；其中在弹幕情感预测模块中预测出的弹幕情感计算公式如下：In the sentiment analysis module, there are two parts: the bullet chat emotion prediction module and the loss calculation module; the bullet chat emotion predicted in the bullet chat emotion prediction module Calculated as follows:

其中，是Softmax函数，LN表示层正则化操作，/>表示多层感知器，/>为可学习的情感预测矩阵，/>为可学习的视频情感矩阵，/>为可学习的周围弹幕情感矩阵，表示可学习的目标弹幕情感矩阵，/>表示串联操作，·表示乘积，/>、/>、/>均为可学习参数，在联合模型训练过程中进行参数优化，以输的模型达到预期效果。in, Is the Softmax function, LN represents the layer regularization operation, /> represents a multi-layer perceptron, /> is a learnable sentiment prediction matrix, /> is a learnable video sentiment matrix, /> is a learnable surrounding barrage emotion matrix, Represents a learnable target barrage emotion matrix, /> Indicates concatenation operation, means product, /> , /> , /> Both are learnable parameters, and parameter optimization is performed during the joint model training process to achieve the expected effect with the lost model.

在损失计算模块中情感预测的损失函数构建公式如下：The loss function for sentiment prediction in the loss calculation module The construction formula is as follows:

其中，表示批处理，/>为预测出的弹幕情感，该预测出的弹幕情感是弹幕评论/>通过该联合模型输出的预测弹幕情感，/>为真实的弹幕情感，该真实的弹幕情感是弹幕评论/>对应的实际情感；in, Indicates batch processing, /> is the predicted barrage emotion, the predicted barrage emotion is the barrage comment/> The predicted barrage sentiment output by the joint model, /> For the real barrage emotion, the real barrage emotion is the barrage comment/> the corresponding actual emotion;

总体损失函数的计算过程如下：overall loss function The calculation process is as follows:

其中，表示损失平衡参数，基于损失和反向传播算法更新联合模型的可学习参数，直至模型性能达到预期效果。in, Indicates the loss balance parameter, and updates the learnable parameters of the joint model based on the loss and the backpropagation algorithm until the model performance reaches the expected effect.

第一：步骤S3提供了一种门控融合机制，利用目标弹幕评论来对周围的弹幕评论进行筛选过滤操作，让一些周围的具有相同情感的有用的弹幕评论可以作为目标弹幕评论的上下文信息提供帮助，解决弹幕评论短、没有足够的上下文信息等问题，提升了目标弹幕的质量。First: step S3 provides a gating fusion mechanism, using the target bullet chat comments to filter the surrounding bullet chat comments, so that some surrounding useful bullet chat comments with the same emotion can be used as the target bullet chat comments Provide help with context information, solve problems such as short bullet chat comments and insufficient context information, and improve the quality of target bullet chat.

第二：步骤S4提供了一种多模态融合增强机制，通过多模态融合模块将视频信息囊括进来，充分考虑视频主题与弹幕的关系，获得增强的特征表示，提升了联合模型对于目标弹幕评论进行情感分析的性能。Second: step S4 provides a multi-modal fusion enhancement mechanism, which includes video information through the multi-modal fusion module, fully considers the relationship between video topics and barrage, obtains enhanced feature representation, and improves the joint model for the target Performance of Sentiment Analysis on Bullet Comments.

第三，步骤S5至S7提供了一种弹幕重构和情感分析机制，利用弹幕重构模块，促进各个模块的整体学习效果，提升情感分析模块的性能。Third, steps S5 to S7 provide a barrage reconstruction and sentiment analysis mechanism, using the barrage reconstruction module to promote the overall learning effect of each module and improve the performance of the sentiment analysis module.

本实施例主要应用于视频实时评论的情感分析，例如在某一时刻，用户发出了一条评论，判断这条评论的感情倾向。This embodiment is mainly applied to sentiment analysis of real-time video comments. For example, at a certain moment, a user sends a comment, and the emotional tendency of the comment is judged.

以上所述，仅为本发明较佳的具体实施方式，但本发明的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本发明揭露的技术范围内，根据本发明的技术方案及其发明构思加以等同替换或改变，都应涵盖在本发明的保护范围之内。The above is only a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto, any person familiar with the technical field within the technical scope disclosed in the present invention, according to the technical solution of the present invention Any equivalent replacement or change of the inventive concepts thereof shall fall within the protection scope of the present invention.

Claims

1. The barrage emotion analysis method based on the joint model is characterized in that barrage comments are input into the trained joint model to output emotion tendencies corresponding to the barrage comments;

the training process of the joint model is as follows:

s1: constructing a training sample set, the training sample set comprising momentsBullet comment>Time->To->Inward bullet comment->Surrounding video->And comment on bullet screen->Video surrounding barrage comment +.>；

S2: for the videoCoding and concatenating to obtain coded video feature ∈ ->Comment on the barrage->And the video surrounding barrage comment +.>EncodingObtaining the coded target barrage characteristic +.>And surrounding barrage features；

S3: based on the target barrage featureFor the surrounding barrage feature->After screening and filtering, connecting in series to obtain all surrounding barrage comments +.>；

S4: video characterization through self-attention layers and cross-attention layersTarget barrage feature->Comment on surrounding barrage->Enhancement processing to obtain enhanced video features>Enhanced target barrage feature->And reinforcing the surrounding barrage->；

S5, performing S5; enhancement of video features based on multi-layered multi-headed pairs of attention layersEnhanced target barrage feature->Reinforcing surrounding barrage->Reconstructing to obtain reconstructed barrage comments, and constructing a barrage reconstructed loss function by using the reconstructed barrage comments and the real barrage comments by using cross entropy>；

S6: for enhanced video featuresEnhanced target barrage feature->Reinforcing surrounding barrage->Sequentially carrying out regularization and normalization operations, and outputting the barrage comment +.>Corresponding to predicted barrage emotion +.>；

S7: predicted barrage emotion using cross entropyAnd true barrage emotion->Constructing a loss function for emotion predictionLoss function based on barrage reconstruction>And loss function of emotion prediction->Calculated overall loss functionBased on the overall loss function->And updating parameters of the joint model by a back propagation algorithm until the performance of the joint model reaches a set expected value;

the surrounding barrage commentsThe calculation formula is as follows:

wherein ,for post-selection->Comment on the surrounding bullet screen->Is->Video surrounding barrage comment->Is (are) peripheral features of->，/>Is a learnable gate matrix +.>Is a learnable gate offset vector, +.>For ReLU function>Representing series connection,/->Representing the product.

2. The method of collaborative model-based barrage emotion analysis of claim 1, wherein the video featuresThe calculation formula of (2) is as follows:

the target barrage featureThe calculation formula of (2) is as follows:

the surrounding barrage featureThe calculation formula of (2) is as follows:

wherein ,，/>，/>representing series connection,/->Representing a video encoder>Representing a long and short term memory network.

3. The barrage emotion analysis method based on joint model as set forth in claim 1, characterized in that in step S4: video characterization through self-attention layers and cross-attention layersTarget barrage feature->Comment on surrounding bullet screenEnhancement processing to obtain enhanced video features>Enhanced target barrage feature->And reinforcing the surrounding barrageSpecifically, the method comprises the following steps:

characterizing videoTarget barrage feature->Comment on surrounding barrage->Inputting as a first layer of the self-attention layer and the cross-attention layer and performing an L-layer iteration, wherein the L-layer is the total layer number of the self-attention layer and the cross-attention layer;

in the first placeLayer input video feature->Obtaining the input video feature of the next layer +.>The following are provided:

in the first placeLayer input target barrage feature->Obtaining the input target barrage feature of the next layer>：

In the first placeLayer input surrounding barrage comment->Obtaining the comment +.>：

Where SA represents the self-attention layer and CA represents the cross-attention layer.

4. A combined model-based barrage emotion analysis method as claimed in claim 3, characterized in that in step S5, the barrage reconstructed loss functionThe construction formula is as follows:

wherein ,indicating batch processing, +.>Representing cross entropy loss, < >>Representing a reconstruction module->Comment of bullet generated by the reconstruction module is represented, < ->Indicating time->Is a true bullet comment;

specifically, the bullet comments generated by the reconstruction module are specifically expressed in the following form:

wherein ,representing a multi-layer perceptron, LN representing regularization operation, MHA representing cross-multi-headed attention.

5. The method of claim 4, wherein in step S6, predicted barrage emotion is predictedThe calculation formula is as follows:

wherein ,is a Softmax function, LN represents a layer regularization operation, +.>Representing a multi-layer sensor->For a learnable emotion prediction matrix, +.>Is a learnable video emotion matrix, +.>Is a surrounding barrage emotion matrix which can be learned, < + >>Representing a learnable target barrage emotion matrix, < ->Representing a tandem operation, representing a product.

6. The method of collaborative model-based barrage emotion analysis according to claim 5, wherein in step S7, the emotion predicted penalty functionThe construction formula is as follows:

the overall loss functionThe calculation process of (2) is as follows:

wherein ,for predicted barrage emotion, +.>Is true barrage emotion +.>Representing cross entropy loss, < >>Representing loss balance parameters, +.>Indicating batch processing.

7. The barrage emotion analysis system based on the joint model is characterized in that barrage comments are input into the trained joint model to output emotion tendencies corresponding to the barrage comments;

the analysis system comprises a construction module, a video coding module, a text coding module, a door control fusion module, a multi-mode fusion module, a barrage reconstruction module, a barrage emotion prediction module and a loss calculation module;

the construction module is used for constructing a training sample set, and the training sample set comprises momentsBullet comment>Time of dayTo->Inward bullet comment->Surrounding video->And comment on bullet screen->Video surrounding barrage comment +.>；

The video coding module is used for coding the videoCoding and concatenating to obtain coded video feature ∈ ->；

The text coding module is used for commenting on the barrageAnd the video surrounding barrage comment +.>Coding to obtain the coded target barrage characteristic +.>And surrounding barrage feature->；

The gating fusion module is based on the target barrage characteristicsFor the surrounding barrage feature->After screening and filtering, connecting in series to obtain all surrounding barrage comments +.>；

The multi-mode fusion module is used for video features through the self-attention layer and the cross-attention layerTarget barrage feature->Comment on surrounding barrage->Processing to obtain enhanced video features->Enhanced target barrage feature->And reinforcing the surrounding barrage->；

The bullet screen reconstruction module is used for enhancing vision based on multi-layer multi-head attention layer pairsFrequency characteristicsEnhanced target barrage feature->Reinforcing surrounding barrage->Reconstructing to obtain reconstructed barrage comments, and constructing a barrage reconstructed loss function by using the reconstructed barrage comments and the real barrage comments by using cross entropy>；

The barrage emotion prediction module is used for enhancing video featuresEnhanced target barrage feature->Reinforcing surrounding barrage->Sequentially carrying out regularization and normalization operations, and outputting the barrage comment +.>Corresponding to predicted barrage emotion +.>；

The loss calculation module is used for predicting bullet screen emotion by using cross entropyAnd true barrage emotion->Construction of a loss function for emotion prediction>Loss function based on barrage reconstruction>And loss function of emotion prediction->Calculating the overall loss function->Updating parameters of the joint model based on the overall loss function and the back propagation algorithm until the performance of the joint model reaches a set expectation;

the surrounding barrage commentsThe calculation formula is as follows: