CN112199550B - Short video click rate prediction method based on emotion capsule network - Google Patents

Short video click rate prediction method based on emotion capsule network Download PDF

Info

Publication number
CN112199550B
CN112199550B CN202010937121.4A CN202010937121A CN112199550B CN 112199550 B CN112199550 B CN 112199550B CN 202010937121 A CN202010937121 A CN 202010937121A CN 112199550 B CN112199550 B CN 112199550B
Authority
CN
China
Prior art keywords
emotion
user
short video
module
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010937121.4A
Other languages
Chinese (zh)
Other versions
CN112199550A (en
Inventor
吴健
顾盼
韩玉强
高维
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Industrial Technology Research Institute of ZJU
Original Assignee
Shandong Industrial Technology Research Institute of ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Industrial Technology Research Institute of ZJU filed Critical Shandong Industrial Technology Research Institute of ZJU
Priority to CN202010937121.4A priority Critical patent/CN112199550B/en
Publication of CN112199550A publication Critical patent/CN112199550A/en
Application granted granted Critical
Publication of CN112199550B publication Critical patent/CN112199550B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/735Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Library & Information Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention belongs to the technical field of internet service, and particularly relates to a short video click rate prediction method based on an emotion capsule network. A short video click rate prediction method based on an emotion capsule network comprises the following steps: s1, dividing a user behavior sequence into a block sequence; s2, extracting module features from the user block sequence and the target short video by adopting a gate mechanism; s3, using an emotion capsule network to obtain emotion characterization of a user on a target short video; s4, predicting the click rate of the user on the target short video according to the emotion characteristics; s5, designing a loss function according to the model characteristics; s6, updating model parameters by adopting an Adam optimizer. The invention provides a short video click rate prediction method based on an emotion capsule network, which is used for judging emotion of a user to each module of a target short video by utilizing positive feedback and negative feedback data of the user in a short video platform and predicting the click rate of the user to the current short video in a fine granularity.

Description

Short video click rate prediction method based on emotion capsule network
Technical Field
The invention belongs to the technical field of internet service, and particularly relates to a short video click rate prediction method based on an emotion capsule network.
Background
Short video is a new type of video with shorter time. The shooting of short videos does not require the use of specialized equipment nor specialized skills. The user can shoot and upload the short video platform conveniently by the mobile phone, so that the short video quantity of the short video platform is increased very rapidly. The method and the system have the advantages that the demand for an effective short video recommendation system is urgent, and the effective short video recommendation system can improve user experience and user viscosity, so that huge commercial value is brought to a platform.
In recent years, many researchers have proposed personalized recommendation methods based on videos. These methods can be divided into three categories: collaborative filtering, content-based recommendation, and hybrid recommendation methods. But short videos have different characteristics than videos: the descriptive text is of lower quality, shorter duration and longer sequence of interactions by the user over a period of time. Thus, short video recommendations are a more challenging task and researchers have proposed some approaches. For example, wei et al use a graph convolution structure to fuse multimodal information of short videos, thereby better simulating user preferences; chen et al use a hierarchical attention mechanism to calculate the importance of both items and categories to get more accurate predictions.
Although these methods achieve good results, these structures only use the user's positive feedback information and ignore the negative feedback information. Negative feedback information refers to the behavior that the user looks at the cover of the short video, but does not click to watch. In recent research results, li et al combine positive and negative feedback data and use a graph-based recurrent neural network to model, ultimately resulting in user preferences. But they calculate the user's preference for the short video as a whole, in fact the video has many modules and the user has different emotions and preferences for the different modules of the video. And they directly weight sum the predicted results of positive and negative feedback without fine granularity discussing the specific effects of positive and negative feedback.
Disclosure of Invention
The invention aims to solve the technical problem of providing a short video click rate prediction method based on an emotion capsule network, which is used for judging emotion of a user to each module of a target short video by utilizing positive feedback and negative feedback data of the user in a short video platform and predicting the click rate of the user to the current short video in a fine granularity manner. For this purpose, the invention adopts the following technical scheme:
a short video click rate prediction method based on an emotion capsule network comprises the following steps:
s1, dividing a user behavior sequence into a block sequence;
s2, extracting module features from the user block sequence and the target short video by adopting a gate mechanism, wherein the module features are the user block sequence module features and the target short video module features respectively;
s3, obtaining emotion characterization of a user on a target short video by using an emotion capsule network, extracting emotion characteristics from low-level module characteristics through the capsule network, and analyzing emotion of the user on different modules of the short video to predict click rate, wherein the emotion is classified into positive, negative and neutral;
the positive emotion data is derived from a positive feedback sequence of a user, the negative emotion data is derived from a negative feedback sequence of the user, and the neutral emotion data is derived from a common part of the positive feedback sequence and the negative feedback sequence;
s4, predicting the click rate of the user on the target short video according to the emotion characteristics;
s5, designing a loss function according to the model characteristics;
s6, updating model parameters by adopting an Adam optimizer.
Wherein the short video is composed of finer granularity modules (e.g., video scenes, video topics, video emotions); the emotions are classified into positive (positive), negative (negative) and neutral (neutral), and are demonstrated by the first 3 letters of english for three emotions, respectively, in the following formulas.
Further, the positive emotion data is derived from the positive feedback sequence of the user, the negative emotion data is derived from the negative feedback sequence of the user, and the neutral emotion data is derived from the common part of the positive feedback sequence and the negative feedback sequence.
On the basis of adopting the technical scheme, the invention can also adopt the following further technical scheme:
the step S1 further includes:
s11, using a window with length w to sequence clicking behaviors of a user
Figure GDA0004137750590000031
Figure GDA0004137750590000032
Dividing into m blocks, each block characterizing a meterThe calculation method is as follows:
Figure GDA0004137750590000033
wherein ,
Figure GDA0004137750590000034
preference characteristics at the kth block for the user;
s12, processing negative feedback information of the user in the same mode to obtain negative feedback block representation
Figure GDA0004137750590000035
The calculation method of the step S2 is as follows:
Figure GDA0004137750590000036
wherein ,
Figure GDA0004137750590000037
representing the ith module feature of the kth block, W i,1 and Wi,2 Is the transfer matrix of the i-th module,
Figure GDA0004137750590000038
is the bias vector of the ith block, σ is the sigmoid activation function, and Σ is the multiplication of element levels, +.>
Figure GDA0004137750590000041
Is a preference feature of the kth block in the click sequence, q i Is a representation of the ith module and q i Shared for all users. The number M of modules of the short video is a super parameter.
The step S2 further includes:
s21, after each module vector representation of each block is obtained, the same module information in all the blocks is aggregated by adopting an average pool:
Figure GDA0004137750590000042
where M is the number of blocks, the formula derives M module features from the positive feedback sequence
Figure GDA0004137750590000043
S22, obtaining M module characteristics from the negative feedback sequence by adopting the same method
Figure GDA0004137750590000049
Obtaining M module features from a target short video>
Figure GDA0004137750590000044
The method extracts positive emotion from the click sequence and negative emotion from the non-click sequence. Further, some module features often appear in both click and un-click sequences of the user, and the user holds neutral feelings about such modules. Therefore, the step S3 further includes:
s31, matching the module features extracted from the user sequence with the module features of the target short video one by one to form an activating unit:
Figure GDA0004137750590000045
wherein ,
Figure GDA0004137750590000046
is the i-th module feature of the target short video,/->
Figure GDA0004137750590000047
Is the ith module feature of the user positive feedback sequence, as is the multiplication of the element dimensions, g is the activation function;
s32, obtaining an activation unit of the negative feedback sequence by adopting the same method
Figure GDA0004137750590000048
S33, extracting emotion characteristics from an activation unit extracted by positive feedback by adopting an emotion capsule network:
Figure GDA0004137750590000051
Figure GDA0004137750590000052
wherein s.epsilon.pos, neu,
Figure GDA0004137750590000053
is the conversion matrix of the ith activation unit of the positive feedback sequence to the emotion capsule s; front emotion capsule v pos By->
Figure GDA0004137750590000054
Is obtained by a weighted sum of (2); />
Figure GDA0004137750590000055
Is the connection coefficient, representing
Figure GDA0004137750590000056
Weight of->
Figure GDA0004137750590000057
The parameters are updated by adopting a dynamic routing algorithm;
g is a vector activation function (squash activation) commonly used in capsule networks, the formula is as follows:
Figure GDA0004137750590000058
wherein, the sum of the values represents the length of the vector;
s34, extracting emotion characteristics from an activation unit extracted by negative feedback by adopting an emotion capsule network:
negative emotion capsule v neg Equal to:
Figure GDA0004137750590000059
Figure GDA00041377505900000510
wherein s is { neg, neu };
s35, extracting from positive feedback and negative feedback sequences to obtain a neutral emotion capsule:
Figure GDA00041377505900000511
further, the connection coefficient is improved
Figure GDA00041377505900000512
The updating method of (1) comprises the following steps:
s301, increasing a temperature coefficient, and improving a dynamic routing coefficient by increasing the temperature coefficient
Figure GDA0004137750590000061
The formula is as follows: />
Figure GDA0004137750590000062
Wherein p is ∈ { +, - }, and s is { pos, neg, neu };
Figure GDA0004137750590000063
is the connection coefficient of the input capsule i to the output capsule s and is initialized to 0; τ is the temperature coefficient; when τ is 0 + Output emotion capsules tend to focus on only one input capsule; and when tau-infinity, the effect of the input capsule on the output emotion capsule tends to be consistent.
S302, considering importance degrees of different modules of the short video, the length of the activation unit can be used for describing the importance degrees of the modules. Thus, according toDynamic routing coefficient correction for importance degree of different short video modules
Figure GDA0004137750590000064
The formula is as follows:
Figure GDA0004137750590000065
wherein ,
Figure GDA0004137750590000066
is the length of the activation unit, which can account for the importance of the module, p e { +, - }, and s e { pos, neg, neu }.
Said step S4 further comprises a given emotion capsule v s Calculating the probability of clicking the target short video by the user:
Figure GDA0004137750590000067
Figure GDA0004137750590000068
wherein s.epsilon.pos, neg, neu,
Figure GDA0004137750590000069
and />
Figure GDA00041377505900000610
Is a transfer matrix->
Figure GDA00041377505900000611
Is the offset vector, b s,2 Is an offset scalar; sigma is a sigmoid activation function, b u Is the bias of the user dimension, vs is the length of the vector, representing the confidence of the emotion.
The step S5 includes:
s51, predicting value of click rate of target short video by user
Figure GDA00041377505900000612
Calculating predictive value +.>
Figure GDA00041377505900000613
And the true value y to update the model parameters using the error; a cross-entropy loss function (cross-entropy loss) is used to guide the update process of model parameters:
Figure GDA0004137750590000071
wherein y epsilon {0,1} is a true value representing whether the user clicked on the target short video; sigma is a sigmoid function;
s52, in order to ensure that the emotion capsule network can accurately capture emotion with fine granularity, an edge loss function L is increased stm And an inconsistent loss function L asp As a regularization term, the cross entropy loss function is combined with two regularization terms to obtain a complete loss function:
Figure GDA0004137750590000072
the edge loss function L stm The calculation formula is as follows:
Figure GDA0004137750590000073
wherein ,
Figure GDA0004137750590000074
representing all of the dataset<User, short video>Pairing; when the true value y=1, v s =v pos The method comprises the steps of carrying out a first treatment on the surface of the Otherwise, v s =v neg ;/>
Figure GDA0004137750590000075
Representing the reverse emotion of emotion capsule s;
the disagreement regularization (disagreement regularization) leads M module vectors q to trend disagreement, different module characteristics are extracted from the short video as much as possible, and the disagreement regularization term L asp The calculation formula is as follows:
Figure GDA0004137750590000076
where M is the number of module vectors q.
The beneficial technical effects of the invention are as follows:
(1) The invention provides a short video click rate prediction method based on an emotion capsule network, which utilizes a gate mechanism and the emotion capsule network to analyze different emotions of a user to different modules of a short video from positive feedback information and negative feedback information of the user so as to obtain more accurate prediction.
(2) The invention provides a novel emotion capsule network framework, which extracts positive, negative and neutral emotion of a user from different modules of a short video sequence of the user. And improvements are made to the framework and dynamic routing algorithms of the capsule network.
(3) A large number of experiments are carried out based on the two pieces of disclosed short video data, and the method has better effect than the latest method in the same data.
Drawings
FIG. 1 is a schematic flow chart of a short video click rate prediction method based on an emotion capsule network;
FIG. 2 is a model frame diagram of a short video click rate prediction method based on an emotion capsule network;
FIG. 3 is a diagram of the positive feedback and negative feedback information of a user of a short video click rate prediction method based on an emotion capsule network.
Detailed Description
In order to further understand the present invention, the short video click rate prediction method based on the emotion capsule network provided by the present invention is specifically described below with reference to the specific embodiments, but the present invention is not limited thereto, and the technical personnel in the field make insubstantial improvements and adjustments under the core guiding concept of the present invention, and still fall within the protection scope of the present invention.
The short video click rate prediction task is to build a model to predict the probability of a user clicking on a short video.
The historical sequence of the user is expressed as
Figure GDA0004137750590000091
Where p ε { +, - } represents click and no click behavior, x j Representing the jth short video, l is the length of the sequence. The whole sequence can be further subdivided into click sequences +.>
Figure GDA0004137750590000097
And the click-free sequence->
Figure GDA0004137750590000093
Figure GDA0004137750590000094
I.e. positive feedback and negative feedback information. Thus, the short video click rate prediction problem can be expressed as: input user click sequence +.>
Figure GDA0004137750590000095
Click sequence->
Figure GDA0004137750590000096
Target short video x new To predict user-to-target short video x new Is a click rate of (a) is provided.
Therefore, the invention provides a short video click rate prediction method based on an emotion capsule network. According to the positive feedback and negative feedback information of the user on the short video, the emotion of the user on different modules of the short video is mined, and the click rate of the user on the target short video is predicted. Positive feedback here means that the user clicks on a short video; negative feedback means that the platform shows the cover of the short video, but the user does not click on the short video, indicating that the user is not interested in the short video. The method considers that the user has different emotions (sentiments) on different modules (aspect) of the short video, predicts the click rate of the user on the target short video, and should analyze the preference of the user on the different modules of the short video. As shown in fig. 3, there are three modules for a short video: video scene, video theme, video emotion. From the clicking and non-clicking actions of the user, we can find that the user likes the beauty related video theme and the positive video emotion, but dislikes the animal related theme and the negative video emotion. Further, the user maintains a neutral attitude to the video scene.
The method mainly consists of three parts, as shown in fig. 2. The first part is to extract the module feature (aspect) from the user positive feedback and negative feedback information by using a gate mechanism, and simultaneously extract the module feature from the target short video by using the same method. And the second part is to pair the two module features one by one to form an activation unit, input the activation unit into an emotion capsule network and forecast the emotion of a user to different modules of the short video. The third part is to predict the click rate of the short video based on the emotional characteristics of the user on the current short video.
As shown in fig. 1, according to one embodiment of the invention, the method comprises the steps of:
s100, dividing the user behavior sequence into block (block) sequences. Click behavior sequence for a user
Figure GDA00041377505900001010
Can be expressed as +.>
Figure GDA0004137750590000102
wherein />
Figure GDA0004137750590000103
Is the cover map feature vector of the short video, and d is the feature vector length. Because the short video duration is short, the behavior sequence of the user is longer. Therefore, the method uses a window with length w to divide the sequence X + The short video of user interactions in one block tends to be relatively similar, divided into m blocks.
At this time, each block feature characterization is calculated as follows:
Figure GDA0004137750590000104
wherein ,
Figure GDA0004137750590000105
is a preference feature of the user at the kth block. Although some of the information is lost by summing, the main information is preserved, which is what we need. The method adopts the same mode to process the negative feedback information of the user, and obtains the block representation +.>
Figure GDA0004137750590000106
S200, extracting module features from a user block sequence and a target short video, the short video being composed of finer granularity modules (e.g., video scenes, video topics, video emotions).
The method adopts the door mechanism extraction module characteristic, and the following formula is the ith module for extracting the kth block:
Figure GDA0004137750590000107
wherein ,Wi,1 and Wi,2 Is the transfer matrix of the i-th module,
Figure GDA0004137750590000108
is the bias vector for the i-th module. Sigma is the sigmoid activation function, +.. />
Figure GDA0004137750590000109
Is a preference feature of the kth block in the click sequence, q i Is a representation of the ith module and q i Shared for all users. The number M of modules of the short video is a super parameter, and the number M is set to be 5 through experimental verification in the method. After each module vector representation of each block is obtained, the patentThe same module information in all blocks is aggregated by using an average pool (average pool):
Figure GDA0004137750590000111
where m is the number of blocks. Finally we can obtain M from the positive feedback sequence
Figure GDA0004137750590000112
Module features. By the same way, M module features can be obtained from the negative feedback sequence>
Figure GDA0004137750590000118
M module features can be obtained from the target short video +.>
Figure GDA0004137750590000113
S300, using an emotion capsule network to obtain emotion characterization of a user on a target short video. For a target short video, the click rate is predicted by analyzing the emotion of a user to different modules of the short video. The method adopts a capsule network to extract emotion characteristics from low-level module characteristics. There are three kinds of emotion, namely: positive (positive), negative (negative), and neutral (neutral). The method extracts positive emotion from the click sequence and negative emotion from the non-click sequence. Further, some module features often appear in both click and un-click sequences of the user, and the user holds neutral feelings about such modules. Such emotions also play a role in prediction, especially when negative and positive emotions are not apparent.
Firstly, the module features extracted from the user sequence and the module features of the target short video are paired one by one to form an activation unit:
Figure GDA0004137750590000114
wherein ,
Figure GDA0004137750590000115
is the i-th module feature of the target short video,/->
Figure GDA0004137750590000116
Is the i-th module feature of the user positive feedback sequence, +.. g is a vector activation function (squarish) commonly used in capsule networks, the formula is as follows:
Figure GDA0004137750590000117
wherein, representing the length of the vector. By the same method, the activation unit of the negative feedback sequence can be obtained
Figure GDA0004137750590000121
And extracting emotion characteristics from the extracted active units of positive feedback and negative feedback by adopting a capsule network:
Figure GDA0004137750590000122
Figure GDA0004137750590000123
wherein s.epsilon.pos, neu,
Figure GDA0004137750590000124
is the transition matrix of the ith activation element of the positive feedback sequence to the emotion capsule s, from which positive and neutral emotion is obtained as can be seen from the formula. Front emotion capsule v pos By->
Figure GDA0004137750590000125
Is obtained by a weighted sum of (a).
Also, g is the activation function (squarish) commonly used in capsule networks, and
Figure GDA0004137750590000126
Figure GDA0004137750590000127
is a connection coefficient representing->
Figure GDA0004137750590000128
Weight of->
Figure GDA0004137750590000129
The parameters are updated using a dynamic routing algorithm.
Likewise, negative emotion capsule v neg Equal to:
Figure GDA00041377505900001210
Figure GDA00041377505900001211
where s.epsilon { neg, neu }.
And the neutral emotion capsule is extracted from positive feedback and negative feedback sequences:
Figure GDA00041377505900001212
therefore, positive emotion data is derived from the positive feedback sequence of the user, negative emotion data is derived from the negative feedback sequence of the user, and neutral emotion data is derived from a common part of the positive feedback sequence and the negative feedback sequence. This structure is also innovative to the present method, unlike the full connectivity of traditional capsule networks. To increase the differentiation between emotion capsules, we further improve
Figure GDA0004137750590000131
Is updated according to the update mode of the system. The modification has two points, the first point is to increase the temperature coefficient:
Figure GDA0004137750590000132
where p ε { +, - }, and s ε { pos, neg, neu }.
Figure GDA0004137750590000133
Is the connection coefficient of the input capsule i to the output capsule s and is initialized to 0.τ is the temperature coefficient, and experiments have shown that τ=0.8 works best in this data. When τ is 0 + Output emotion capsules tend to focus on only one input capsule; and when tau-infinity, the effect of the input capsule on the output emotion capsule tends to be consistent.
The second point is the importance of the modules for which the length of the activation unit can be stated, taking into account the importance of the different modules of the short video. We use the length of the activation unit to correct
Figure GDA0004137750590000134
Figure GDA0004137750590000135
wherein ,
Figure GDA0004137750590000136
is the length of the activation element, p e { +, - }, and s e { pos, neg, neu }.
S400, predicting the click rate of the user on the target short video according to the emotion characteristics. Given emotion capsule v s The probability of clicking the target short video by the user is calculated as follows:
Figure GDA0004137750590000137
Figure GDA0004137750590000138
wherein s.epsilon.pos, neg, neu,
Figure GDA0004137750590000139
and />
Figure GDA00041377505900001310
Is a transfer matrix->
Figure GDA00041377505900001311
Is the offset vector, b s,2 Is an offset scalar. Sigma is a sigmoid activation function, b u Is the bias of the user dimension, ||v s The i is the length of the vector and represents the confidence of the emotion. />
S500, designing a loss function according to the model characteristics. Predicted value of click rate of target short video by user
Figure GDA00041377505900001312
Calculating predictive value +.>
Figure GDA00041377505900001313
And the true value y, and then updating the model parameters using the error. We use a cross-entropy loss function (cross-entropy loss) to guide the update process of model parameters:
Figure GDA0004137750590000146
where y e {0,1} is a true value representing whether the user clicked on the target short video. Sigma is a sigmoid function.
Meanwhile, in order to ensure that the emotion capsule network can correctly capture emotion with fine granularity, an edge loss function (margin loss) is added as a regularization term:
Figure GDA0004137750590000141
wherein ,
Figure GDA0004137750590000142
representing all of the dataset<User, short video>For the pair, and according to the experimental results, we set the parameters e=0.8 and λ=0.5. Notably, when the true value y=1, v s =v pos The method comprises the steps of carrying out a first treatment on the surface of the Otherwise, v s =v neg 。/>
Figure GDA0004137750590000145
Representing the negative emotion of emotion capsule s.
Furthermore, the method also introduces inconsistent regularization (disagreement regularization) to lead M module vectors q to trend to be inconsistent, and different module characteristics are extracted from the short video as much as possible:
Figure GDA0004137750590000143
finally, combining the cross entropy loss function and two regular terms to obtain a complete loss function, wherein the complete loss function is as follows:
Figure GDA0004137750590000144
wherein, in our experiments, lambda s =0.1 and λ a =0.1. We updated model parameters using Adam optimizer.
The foregoing description of the embodiments is provided to facilitate the understanding and application of the invention to those skilled in the art. It will be apparent to those having ordinary skill in the art that various modifications to the above-described embodiments may be readily made and the generic principles described herein may be applied to other embodiments without the use of inventive faculty. Therefore, the present invention is not limited to the above-described embodiments, and those skilled in the art, based on the present disclosure, should make improvements and modifications within the scope of the present invention.

Claims (1)

1. A short video click rate prediction method based on an emotion capsule network is characterized by comprising the following steps:
s1, dividing a user behavior sequence into a block sequence;
the step S1 further includes:
s11, using a window with length w to sequence clicking behaviors of a user
Figure FDA0004102906260000011
Figure FDA0004102906260000012
Dividing the block into m blocks, and calculating the characteristic characterization of each block as follows:
Figure FDA0004102906260000013
wherein ,
Figure FDA0004102906260000014
preference characteristics at the kth block for the user;
s12, processing negative feedback information of the user in the same mode to obtain negative feedback block representation
Figure FDA0004102906260000015
S2, extracting module features from the user block sequence and the target short video by adopting a gate mechanism, wherein the module features are the user block sequence module features and the target short video module features respectively;
the calculation method of the step S2 is as follows:
Figure FDA0004102906260000016
wherein ,
Figure FDA0004102906260000017
representing the ith module feature of the kth block, W i,1 and Wi,2 Is the transfer matrix of the ith module, +.>
Figure FDA0004102906260000018
Is the bias vector of the ith block, σ is the sigmoid activation function, and Σ is the multiplication of element levels, +.>
Figure FDA0004102906260000019
Is a preference feature of the kth block in the click sequence, q i Is a representation of the ith module and q i Sharing for all users;
the step S2 further includes:
s21, after each module vector representation of each block is obtained, the same module information in all the blocks is aggregated by adopting an average pool:
Figure FDA0004102906260000021
where M is the number of blocks, the formula derives M module features from the positive feedback sequence
Figure FDA0004102906260000022
S22, obtaining M module characteristics from the negative feedback sequence by adopting the same method
Figure FDA0004102906260000023
Obtaining M module features from a target short video>
Figure FDA0004102906260000024
S3, obtaining emotion characterization of a user on a target short video by using an emotion capsule network, extracting emotion characteristics from low-level module characteristics through the capsule network, and analyzing emotion of the user on different modules of the short video to predict click rate, wherein the emotion is divided into positive emotion, negative emotion and neutral emotion;
the positive emotion data is derived from a positive feedback sequence of a user, the negative emotion data is derived from a negative feedback sequence of the user, and the neutral emotion data is derived from a common part of the positive feedback sequence and the negative feedback sequence;
the step S3 further includes:
s31, matching the module features extracted from the user sequence with the module features of the target short video one by one to form an activating unit:
Figure FDA0004102906260000025
wherein ,
Figure FDA0004102906260000026
is the i-th module feature of the target short video,/->
Figure FDA0004102906260000027
Is the ith module feature of the user positive feedback sequence, as is the multiplication of the element dimensions, g is the activation function; />
S32, obtaining an activation unit of the negative feedback sequence by adopting the same method
Figure FDA0004102906260000028
S33, extracting emotion characteristics from an activation unit extracted by positive feedback by adopting an emotion capsule network:
Figure FDA0004102906260000029
Figure FDA0004102906260000031
wherein s.epsilon.pos, neu,
Figure FDA0004102906260000032
is the conversion matrix of the ith activation unit of the positive feedback sequence to the emotion capsule s; front emotion capsule v pos By->
Figure FDA0004102906260000033
Is obtained by a weighted sum of (2); />
Figure FDA0004102906260000034
Is the connection coefficient, representing
Figure FDA0004102906260000035
Weight of->
Figure FDA0004102906260000036
The parameters are updated by adopting a dynamic routing algorithm;
the activation function g is a vector activation function commonly used in capsule networks:
Figure FDA0004102906260000037
wherein, the sum of the values represents the length of the vector;
s34, extracting emotion characteristics from an activation unit extracted by negative feedback by adopting an emotion capsule network:
negative emotion capsule v neg Equal to:
Figure FDA0004102906260000038
Figure FDA0004102906260000039
wherein s is { neg, neu };
s35, extracting from positive feedback and negative feedback sequences to obtain a neutral emotion capsule:
Figure FDA00041029062600000310
the step S3 also comprises improving the connection coefficient
Figure FDA00041029062600000311
The updating method of (2), the method includes:
s301, increasing a temperature coefficient, and improving a dynamic routing coefficient by increasing the temperature coefficient
Figure FDA00041029062600000312
The formula is as follows:
Figure FDA0004102906260000041
wherein p is ∈ { +, - }, and s is { pos, neg, neu };
Figure FDA0004102906260000042
is the connection coefficient of the input capsule i to the output capsule s and is initialized to 0; τ is the temperature coefficient;
s302, correcting dynamic routing coefficients according to importance degrees of different short video modules
Figure FDA0004102906260000043
The formula is as follows: />
Figure FDA0004102906260000044
wherein ,
Figure FDA0004102906260000045
is the length of the activation cell, p ε { +, - }, and s ε { pos, neg, neu };
s4, predicting the click rate of the user on the target short video according to the emotion characteristics;
said step S4 further comprises a given emotion capsule v s The probability of clicking the target short video by the user is calculated as follows:
Figure FDA0004102906260000046
Figure FDA0004102906260000047
wherein s.epsilon.pos, neg, neu,
Figure FDA0004102906260000048
and />
Figure FDA0004102906260000049
Is a transfer matrix->
Figure FDA00041029062600000410
Is the offset vector, b s,2 Is an offset scalar; sigma is a sigmoid activation function, b u Is the bias of the user dimension, ||v s The l is the length of the vector;
s5, designing a loss function according to the model characteristics;
the step S5 includes the steps of:
s51, predicting value of click rate of target short video by user
Figure FDA00041029062600000411
Calculating predictive value +.>
Figure FDA00041029062600000412
And the true value y to update the model parameters using the error; the cross entropy loss function is adopted to guide the updating process of the model parameters:
Figure FDA0004102906260000051
wherein y epsilon {0,1} is a true value representing whether the user clicked on the target short video; sigma is a sigmoid function;
s52, increasing the edge loss function L stm And an inconsistent loss function L asp As a canonical term, the loss function is:
Figure FDA0004102906260000052
wherein ,λs and λa Respectively the loss function L stm and Lasp Is a regular parameter of (2);
in the step S52, the edge loss function L stm The calculation formula is as follows:
Figure FDA0004102906260000053
wherein ,
Figure FDA0004102906260000054
representing all of the dataset<User, short video>Epsilon and lambda are model parameters; when the true value y=1, v s =v pos The method comprises the steps of carrying out a first treatment on the surface of the Otherwise, v s =v neg ;/>
Figure FDA0004102906260000055
Representing the reverse emotion of emotion capsule s;
the inconsistent regularization term L asp The calculation formula is as follows:
Figure FDA0004102906260000056
wherein M is the number of module vectors q;
s6, updating model parameters by adopting an Adam optimizer.
CN202010937121.4A 2020-09-08 2020-09-08 Short video click rate prediction method based on emotion capsule network Active CN112199550B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010937121.4A CN112199550B (en) 2020-09-08 2020-09-08 Short video click rate prediction method based on emotion capsule network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010937121.4A CN112199550B (en) 2020-09-08 2020-09-08 Short video click rate prediction method based on emotion capsule network

Publications (2)

Publication Number Publication Date
CN112199550A CN112199550A (en) 2021-01-08
CN112199550B true CN112199550B (en) 2023-05-19

Family

ID=74005990

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010937121.4A Active CN112199550B (en) 2020-09-08 2020-09-08 Short video click rate prediction method based on emotion capsule network

Country Status (1)

Country Link
CN (1) CN112199550B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112765461A (en) * 2021-01-12 2021-05-07 中国计量大学 Session recommendation method based on multi-interest capsule network
CN112905887B (en) * 2021-02-22 2021-12-14 中国计量大学 Conversation recommendation method based on multi-interest short-term priority model

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109189988A (en) * 2018-09-18 2019-01-11 北京邮电大学 A kind of video recommendation method
CN109948165A (en) * 2019-04-24 2019-06-28 吉林大学 Fine granularity feeling polarities prediction technique based on mixing attention network
CN111144130A (en) * 2019-12-26 2020-05-12 辽宁工程技术大学 Context-aware-based fine-grained emotion classification method for hybrid neural network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109189988A (en) * 2018-09-18 2019-01-11 北京邮电大学 A kind of video recommendation method
CN109948165A (en) * 2019-04-24 2019-06-28 吉林大学 Fine granularity feeling polarities prediction technique based on mixing attention network
CN111144130A (en) * 2019-12-26 2020-05-12 辽宁工程技术大学 Context-aware-based fine-grained emotion classification method for hybrid neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
A Capsule Network for Recommendation and Explaining What You Like and Dislike;Chenliang Li;《2019 Association for Computing Machinery》;20190725;全文 *

Also Published As

Publication number Publication date
CN112199550A (en) 2021-01-08

Similar Documents

Publication Publication Date Title
CN113905391B (en) Integrated learning network traffic prediction method, system, equipment, terminal and medium
CN108287904A (en) A kind of document context perception recommendation method decomposed based on socialization convolution matrix
CN112492396B (en) Short video click rate prediction method based on fine-grained multi-aspect analysis
CN108563755A (en) A kind of personalized recommendation system and method based on bidirectional circulating neural network
WO2023065859A1 (en) Item recommendation method and apparatus, and storage medium
CN112395504B (en) Short video click rate prediction method based on sequence capsule network
CN112199550B (en) Short video click rate prediction method based on emotion capsule network
CN112256916B (en) Short video click rate prediction method based on graph capsule network
CN112765461A (en) Session recommendation method based on multi-interest capsule network
CN112307258B (en) Short video click rate prediction method based on double-layer capsule network
CN111831895A (en) Network public opinion early warning method based on LSTM model
CN110188200A (en) A kind of depth microblog emotional analysis method using social context feature
CN112256918B (en) Short video click rate prediction method based on multi-mode dynamic routing
CN111125428A (en) Time-dependent movie recommendation method based on score prediction function fitting structure
CN108647364B (en) Prediction recommendation method based on mobile terminal application data
CN113051468B (en) Movie recommendation method and system based on knowledge graph and reinforcement learning
CN112307257B (en) Short video click rate prediction method based on multi-information node graph network
CN113704439B (en) Conversation recommendation method based on multi-source information heteromorphic graph
CN112559905B (en) Conversation recommendation method based on dual-mode attention mechanism and social similarity
CN112616072B (en) Short video click rate prediction method based on positive and negative feedback information of user
CN115470397B (en) Content recommendation method, device, computer equipment and storage medium
CN112765401B (en) Short video recommendation method based on non-local network and local network
CN114332723B (en) Video behavior detection method based on semantic guidance
Chen et al. Image Aesthetics Assessment with Emotion-Aware Multi-Branch Network
Zhao et al. Combining influence and sensitivity to factorize matrix for multi-context recommendation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant