CN112199550A - Short video click rate prediction method based on emotion capsule network - Google Patents

Short video click rate prediction method based on emotion capsule network Download PDF

Info

Publication number
CN112199550A
CN112199550A CN202010937121.4A CN202010937121A CN112199550A CN 112199550 A CN112199550 A CN 112199550A CN 202010937121 A CN202010937121 A CN 202010937121A CN 112199550 A CN112199550 A CN 112199550A
Authority
CN
China
Prior art keywords
emotion
short video
user
module
capsule
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010937121.4A
Other languages
Chinese (zh)
Other versions
CN112199550B (en
Inventor
吴健
顾盼
韩玉强
高维
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Industrial Technology Research Institute of ZJU
Original Assignee
Shandong Industrial Technology Research Institute of ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Industrial Technology Research Institute of ZJU filed Critical Shandong Industrial Technology Research Institute of ZJU
Priority to CN202010937121.4A priority Critical patent/CN112199550B/en
Publication of CN112199550A publication Critical patent/CN112199550A/en
Application granted granted Critical
Publication of CN112199550B publication Critical patent/CN112199550B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/735Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention belongs to the technical field of internet services, and particularly relates to a short video click rate prediction method based on an emotion capsule network. A short video click rate prediction method based on an emotion capsule network comprises the following steps: s1, dividing the user behavior sequence into block sequences; s2, extracting module characteristics from the user block sequence and the target short video by adopting a door mechanism; s3, obtaining emotion representation of the target short video by the user by utilizing an emotion capsule network; s4, predicting the click rate of the user on the target short video according to the emotional characteristics; s5, designing a loss function according to the model characteristics; and S6, updating the model parameters by adopting an Adam optimizer. The invention provides a short video click rate prediction method based on an emotion capsule network, which is used for judging the emotion of a user to each module of a target short video and predicting the click rate of the user to the current short video in a fine-grained manner by using positive feedback and negative feedback data of the user in a short video platform.

Description

Short video click rate prediction method based on emotion capsule network
Technical Field
The invention belongs to the technical field of internet services, and particularly relates to a short video click rate prediction method based on an emotion capsule network.
Background
Short video is a new type of video with a short time. The shooting of the short video does not need to use professional equipment and professional skills. The user can conveniently shoot and upload to the short video platform directly through the mobile phone, so that the short video frequency quantity of the short video platform is increased very quickly. The requirement on the effective short video recommendation system is very urgent, and the effective short video recommendation system can improve the user experience and the user viscosity, so that huge commercial value is brought to the platform.
In recent years, many researchers have proposed personalized recommendation methods based on videos. These methods can be divided into three categories: collaborative filtering, content-based recommendations, and hybrid recommendation methods. But short video has different characteristics compared to video: the descriptive text is of low quality, short duration and the user has a long sequence of interactions over a period of time. Therefore, short video recommendations are a more challenging task and some approaches have been proposed by researchers. For example, Wei et al adopt a graph convolution structure to fuse multi-modal information of short videos, thereby better simulating the preference of users; chen et al uses a hierarchical attention mechanism to calculate the importance of both the item and category levels to obtain more accurate predictions.
Although these approaches achieve good results, these architectures only exploit the positive feedback information of the user and ignore the negative feedback information. The negative feedback information refers to the behavior that the user looks at the cover of the short video but does not click to watch. In recent research results, Li et al combine positive and negative feedback data and use a graph-based recurrent neural network to model, and finally obtain the user preference. But they calculate the user's preference for the short video as a whole, in fact, there are many modules of the video and the user has different emotions and preferences for different modules of the video. And they directly weight and sum the prediction results of positive feedback and negative feedback without fine-grained discussion of the specific roles of positive feedback and negative feedback.
Disclosure of Invention
The invention aims to solve the technical problem of providing a short video click rate prediction method based on an emotion capsule network, which utilizes positive feedback and negative feedback data of a user in a short video platform to judge the emotion of the user on each module of a target short video and predict the click rate of the user on the current short video in a fine-grained manner. Therefore, the invention adopts the following technical scheme:
a short video click rate prediction method based on an emotion capsule network comprises the following steps:
s1, dividing the user behavior sequence into block sequences;
s2, extracting module characteristics from the user block sequence and the target short video by adopting a door mechanism, wherein the module characteristics are the user block sequence module characteristics and the target short video module characteristics respectively;
s3, obtaining emotion representation of the user on the target short video by using an emotion capsule network, extracting emotion characteristics from module characteristics of a lower layer through the capsule network, and analyzing emotion of the user on different modules of the short video to predict click rate, wherein the emotion is divided into positive, negative and neutral;
the data of the positive emotion is derived from a positive feedback sequence of the user, the data of the negative emotion is derived from a negative feedback sequence of the user, and the data of the neutral emotion is derived from a common part of the positive feedback sequence and the negative feedback sequence;
s4, predicting the click rate of the user on the target short video according to the emotional characteristics;
s5, designing a loss function according to the model characteristics;
and S6, updating the model parameters by adopting an Adam optimizer.
Wherein, the short video is composed of finer grained modules (e.g., video scene, video theme, video mood); the emotions are divided into positive (positive), negative (negative) and neutral (neutral), and are demonstrated in the following formulas with the first 3 letters of English for the three emotions, respectively.
Furthermore, the data of positive emotion comes from a positive feedback sequence of the user, the data of negative emotion comes from a negative feedback sequence of the user, and the data of neutral emotion comes from a common part of the positive feedback sequence and the negative feedback sequence.
On the basis of the technical scheme, the invention can also adopt the following further technical scheme:
the step S1 further includes:
s11, using window with length w to sequence click action of a user
Figure BDA0002672346120000031
Dividing the block into m blocks, and calculating the characteristic representation of each block in the following way:
Figure BDA0002672346120000032
wherein ,
Figure BDA0002672346120000033
preference characteristics for the user at the kth block;
s12, processing the negative feedback information of the user in the same way to obtain the negative feedback block representation
Figure BDA0002672346120000034
The calculation method of step S2 is as follows:
Figure BDA0002672346120000035
wherein ,
Figure BDA0002672346120000036
represents the ith module characteristic of the kth block, Wi,1 and Wi,2Is the transition matrix for the ith module,
Figure BDA0002672346120000037
is the offset vector for the ith module, σ is the sigmoid activation function, as an element-level multiplication,
Figure BDA0002672346120000038
is a preference feature of the kth block in the click sequence, qiIs a characterization of the ith module and qiShared for all users. Wherein, the number M of modules of the short video is a hyper-parameter.
The step S2 further includes:
s21, after obtaining each module vector representation of each block, adopting an average pool to aggregate the same module information in all blocks:
Figure BDA0002672346120000039
where M is the number of blocks, the formula derives M module characteristics from the positive feedback sequence
Figure BDA00026723461200000310
S22, obtaining M module characteristics from negative feedback sequence by the same method
Figure BDA00026723461200000311
And obtaining M module characteristics from the target short video
Figure BDA0002672346120000041
The method extracts positive emotions from click sequences and negative emotions from non-click sequences. Further, some module features often appear in both click and no-click sequences of a user, and the user then has a neutral feeling with respect to such modules. Therefore, the step S3 further includes:
s31, matching the module features extracted from the user sequence with the module features of the target short video one by one to form an activation unit:
Figure BDA0002672346120000042
wherein ,
Figure BDA0002672346120000043
is the ith module feature of the target short video,
Figure BDA0002672346120000044
is the ith module feature of the user positive feedback sequence, is the multiplication of the element dimension, g is the activation function;
s32, obtaining the activation unit of negative feedback sequence by the same method
Figure BDA0002672346120000045
S33, extracting emotion characteristics from the activation units extracted by positive feedback by adopting an emotion capsule network:
Figure BDA0002672346120000046
Figure BDA0002672346120000047
where s is in the range of { pos, neu },
Figure BDA0002672346120000048
is the transformation matrix from the ith activation unit of the positive feedback sequence to the emotion capsule s; positive emotion capsule vposBy passing
Figure BDA0002672346120000049
Obtaining a weighted sum of;
Figure BDA00026723461200000410
is a connection coefficient, represents
Figure BDA00026723461200000411
The weight of (a) is determined,
Figure BDA00026723461200000412
the parameters are updated by adopting a dynamic routing algorithm;
g is a vector activation function (square activation) commonly used in capsule networks, and the formula is as follows:
Figure BDA00026723461200000413
wherein, | | · | | represents the length of the vector;
s34, extracting emotion characteristics from the activation units extracted by negative feedback by adopting an emotion capsule network:
negative emotion capsule vnegEqual to:
Figure BDA00026723461200000414
Figure BDA0002672346120000051
wherein s belongs to { neg, neu };
s35, extracting neutral emotion capsules from the positive feedback and negative feedback sequences:
Figure BDA0002672346120000052
further, the connection coefficient is improved
Figure BDA0002672346120000053
The updating method comprises the following steps:
s301, increasing the temperature coefficient, and improving the dynamic routing coefficient by increasing the temperature coefficient
Figure BDA0002672346120000054
The formula is as follows:
Figure BDA0002672346120000055
wherein p ∈ { +, - } and s ∈ { pos, neg, neu };
Figure BDA0002672346120000056
is the connection coefficient of the input capsule i to the output capsule s and is initialized to 0; τ is the temperature coefficient; when tau → 0+Output emotion capsules tend to focus on only one input capsule; while when τ → ∞, the effect of the input capsule on the output emotion capsule tends to be consistent.
S302, considering the importance degree of different modules of the short video, the length of the activation unit can explain the importance degree of the modules. Therefore, the dynamic routing coefficient is corrected according to the importance degree of different short video modules
Figure BDA0002672346120000057
The formula is as follows:
Figure BDA0002672346120000058
wherein ,
Figure BDA0002672346120000059
is the length of the activation unit, which can account for the importance of the module, p ∈ { +, - } and s ∈ { pos, neg, neu }.
The step S4 further comprises giving an emotion capsule vsAnd calculating the probability of the user clicking the target short video:
Figure BDA00026723461200000510
Figure BDA00026723461200000511
wherein s belongs to { pos, neg, neu },
Figure BDA00026723461200000512
and
Figure BDA00026723461200000513
is a matrix of transitions that is,
Figure BDA00026723461200000514
is an offset vector, bs,2Is a bias scalar; σ is a sigmoid activation function, buIs the offset of the user dimension, | | vsAnd | is the length of the vector, representing the confidence of the emotion.
The step S5 includes:
s51, predicting value of click rate of target short video through user
Figure BDA0002672346120000061
Calculating a predicted value
Figure BDA0002672346120000062
And the true value y to use the error to update the model parameters; a cross-entropy loss function (cross-entropy loss) is adopted to guide the updating process of the model parameters:
Figure BDA0002672346120000063
wherein y is an actual value and represents whether the user clicks the target short video or not, wherein y belongs to {0,1 }; σ is a sigmoid function;
s52, in order to ensure that the emotion capsule network can correctly capture fine-grained emotion, an edge loss function L is addedstmAnd inconsistent loss function LaspAs a regular term, the cross entropy loss function is combined with two regular terms to obtain a complete loss function:
Figure BDA0002672346120000064
the edge loss function LstmThe calculation formula is as follows:
Figure BDA0002672346120000065
wherein ,
Figure BDA0002672346120000066
representing all in the data set<User, short video>Carrying out pairing; when the true value y is 1, vs=vpos(ii) a Otherwise, vs=vneg
Figure BDA0002672346120000067
Representing the negative emotion of the emotion capsule s;
inconsistent regularization (disagreement) makes M module vectors q tend to be inconsistent, different module features are extracted from a short video as much as possible, and an inconsistent regularization term LaspThe calculation formula is as follows:
Figure BDA0002672346120000068
where M is the number of module vectors q.
The invention has the following beneficial technical effects:
(1) the invention provides a short video click rate prediction method based on an emotion capsule network.
(2) The invention provides a novel emotion capsule network framework, which extracts positive, negative and neutral emotions of a user from different modules of a short video sequence of the user. And improvements are made to the framework and dynamic routing algorithms of the capsule network.
(3) A large number of experiments are carried out based on two pieces of short video data, and the result proves that the method can obtain better effect in the same data than the latest method.
Drawings
FIG. 1 is a schematic flow chart of a short video click rate prediction method based on an emotion capsule network according to the present invention;
FIG. 2 is a model framework diagram of a short video click rate prediction method based on an emotion capsule network according to the present invention;
FIG. 3 is a user positive feedback and negative feedback information diagram of the short video click rate prediction method based on the emotion capsule network.
Detailed Description
For further understanding of the present invention, the short video click rate prediction method based on emotion capsule network provided by the present invention is described in detail below with reference to specific embodiments, but the present invention is not limited thereto, and those skilled in the art can make insubstantial improvements and modifications within the spirit of the present invention, and still fall within the scope of the present invention.
The short video click rate prediction task is to establish a model to predict the probability of the user clicking on the short video.
The history sequence of the user is represented as
Figure BDA0002672346120000071
Where p ∈ { +, - } represents click and no-click behavior, respectively, xjRepresenting the jth short video, l is the length of the sequence. The entire sequence may be further subdivided into click sequences
Figure BDA0002672346120000072
And non-click sequences
Figure BDA0002672346120000073
Namely positive feedback and negative feedback information. Thus, the short video click-through rate prediction problem can be expressed as: entering user click sequences
Figure BDA0002672346120000081
Non-clicked sequence
Figure BDA0002672346120000082
And target short video xnewTo predict the userFor target short video xnewThe click rate of (c).
Therefore, the invention provides a short video click rate prediction method based on an emotion capsule network. According to the method, the emotion of the user on different modules of the short video is mined according to the positive feedback and negative feedback information of the user on the short video, and the click rate of the user on the target short video is predicted. The positive feedback here means that the user clicks the short video; negative feedback means that the platform shows the cover of the short video, but the user does not click on the short video, which indicates that the user has no interest in the short video. The method considers that the user has different feelings (sentiments) on different modules (aspect) of the short video, predicts the click rate of the user on the target short video, and analyzes the preference of the user on different modules of the short video. As shown in fig. 3, a short video has three modules: video scenes, video themes, video emotions. From the user's click and no click behavior, we can find that the user likes a beauty-related video theme as well as positive video emotions, but does not like an animal-related theme as well as negative video emotions. Further, the user maintains a neutral attitude with respect to the video scene.
The method consists essentially of three parts, as shown in FIG. 2. The first part is to adopt a door mechanism to extract module features (aspect features) from positive feedback and negative feedback information of a user, and simultaneously adopt the same method to extract the module features from a target short video. And the second part is to pair the two module characteristics one by one to form an activation unit, input the activation unit into the emotion capsule network and predict the emotion of the user on different modules of the short video. And the third part is used for predicting the short video click rate based on the emotional characteristics of the user on the current short video.
As shown in fig. 1, according to one embodiment of the present invention, the method comprises the steps of:
and S100, dividing the user behavior sequence into block (block) sequences. Click behavior sequence for a user
Figure BDA0002672346120000083
Can be expressed as
Figure BDA0002672346120000084
wherein
Figure BDA0002672346120000085
Is the feature vector of the cover picture of the short video, and d is the feature vector length. The short video has a short duration, which results in a long sequence of user actions. Therefore, the method uses a window of length w to divide the sequence X+The short video frequency of the interaction of the user in one block is similar.
At this time, the calculation manner of each block feature characterization is as follows:
Figure BDA0002672346120000091
wherein ,
Figure BDA0002672346120000092
the user's preference characteristics at the kth block. Although the way of summing may lose a part of the information, the main information is retained, which is what we need. The method adopts the same mode to process the negative feedback information of the user to obtain the negative feedback block representation
Figure BDA0002672346120000093
S200, extracting module characteristics from the user block sequence and the target short video, wherein the short video is composed of modules with finer granularity (such as video scenes, video themes and video emotions).
The method adopts the characteristics of a door mechanism extraction module, and the following formula is an ith module for extracting a kth block:
Figure BDA0002672346120000094
wherein ,Wi,1 and Wi,2Is the transition matrix for the ith module,
Figure BDA0002672346120000095
is the offset vector for the ith module. σ is a sigmoid activation function, which is an element-level multiplication.
Figure BDA0002672346120000096
Is a preference feature of the kth block in the click sequence, qiIs a characterization of the ith module and qiShared for all users. The number M of modules of the short video is a hyper-parameter, and the number is set to 5 through experimental verification in the method. After each module vector representation of each block is obtained, the patent adopts an average pool (averaging pool) to aggregate the same module information in all blocks:
Figure BDA0002672346120000097
where m is the number of blocks. Finally, we can obtain M from the positive feedback sequence
Figure BDA0002672346120000098
And (5) module characteristics. By the same method, M module characteristics can be obtained from the negative feedback sequence
Figure BDA0002672346120000099
And M module characteristics can be obtained from the target short video
Figure BDA00026723461200000910
And S300, obtaining the emotion representation of the user on the target short video by utilizing the emotion capsule network. For a target short video, the method predicts the click rate by analyzing the emotion of the user on different modules of the short video. The method adopts a capsule network to extract the emotional characteristics from the module characteristics of the lower layer. There are three emotions, which are: positive (positive), negative (negative) and neutral (neutral). The method extracts positive emotions from click sequences and negative emotions from non-click sequences. Further, some module features often appear in both click and no-click sequences of a user, and the user then has a neutral feeling with respect to such modules. This type of emotion also plays a role in prediction, especially when negative and positive emotions are not apparent.
Firstly, matching the module characteristics extracted from the user sequence and the module characteristics of the target short video one by one to form an activation unit:
Figure BDA0002672346120000101
wherein ,
Figure BDA0002672346120000102
is the ith module feature of the target short video,
Figure BDA0002672346120000103
is the ith module feature of the user positive feedback sequence, which is a multiplication of the element dimension (element-wise). g is a vector activation function (square) commonly used in capsule networks, and the formula is as follows:
Figure BDA0002672346120000104
where | l | · | |, represents the length of the vector. In the same way, an activation unit with negative feedback sequence can be obtained
Figure BDA0002672346120000105
Then, extracting emotional characteristics from the active units extracted by positive feedback and negative feedback by adopting a capsule network:
Figure BDA0002672346120000106
Figure BDA0002672346120000107
where s is in the range of { pos, neu },
Figure BDA0002672346120000108
is the conversion matrix from the ith activation unit of the positive feedback sequence to the emotion capsule s, and as can be seen from the formula, positive and neutral emotion are obtained from the positive feedback sequence. Positive emotion capsule vposBy passing
Figure BDA0002672346120000109
Is obtained.
Also, g here is the activation function (square) commonly used in capsule networks, and
Figure BDA00026723461200001010
is a connection coefficient, represents
Figure BDA00026723461200001011
The weight of (a) is determined,
Figure BDA00026723461200001012
the parameters are updated using a dynamic routing algorithm.
Likewise, negative emotion capsule vnegEqual to:
Figure BDA0002672346120000111
Figure BDA0002672346120000112
where s is ∈ { neg, neu }.
And the neutral emotion capsule is extracted from a positive feedback sequence and a negative feedback sequence:
Figure BDA0002672346120000113
thus, data for positive emotions are derived from the positive feedback sequence of the user, data for negative emotions are derived from the negative feedback sequence of the user, and data for neutral emotions are derived from a common portion of the positive feedback sequence and the negative feedback sequence. This structure is different from the full connectivity of the traditional capsule network and is an innovation of the method. To increase the distinctiveness between affective capsules, we further improve
Figure BDA0002672346120000114
The update method of (1). The modification is carried out by two points, wherein the first point is to increase the temperature coefficient:
Figure BDA0002672346120000115
wherein p ∈ { +, - } and s ∈ { pos, neg, neu }.
Figure BDA0002672346120000116
Is the connection coefficient of the input capsule i to the output capsule s and is initialized to 0.τ is the temperature coefficient, and experiments show that τ of 0.8 is the best in the data. When tau → 0+Output emotion capsules tend to focus on only one input capsule; while when τ → ∞, the effect of the input capsule on the output emotion capsule tends to be consistent.
The second point is how important the modules can be accounted for by the length of the activation unit, taking into account the importance of the different modules of the short video. We use the length of the active cell to correct
Figure BDA0002672346120000117
Figure BDA0002672346120000118
wherein ,
Figure BDA0002672346120000119
is the length of the activation unit, p ∈ { +, - } and s ∈ { pos, neg, neu }.
And S400, predicting the click rate of the user on the target short video according to the emotional characteristics. Given emotion capsule vsCalculating the probability of clicking the target short video by the user as follows:
Figure BDA0002672346120000121
Figure BDA0002672346120000122
wherein s belongs to { pos, neg, neu },
Figure BDA0002672346120000123
and
Figure BDA0002672346120000124
is a matrix of transitions that is,
Figure BDA0002672346120000125
is an offset vector, bs,2Is a bias scalar. σ is a sigmoid activation function, buIs the offset of the user dimension, | | vsAnd | is the length of the vector, representing the confidence of the emotion.
And S500, designing a loss function according to the model characteristics. Predicting value of click rate of target short video through user
Figure BDA0002672346120000126
Calculating a predicted value
Figure BDA0002672346120000127
And the true value y, and the error is used to update the model parameters. We use cross-entropy loss function (cross-entropy loss) to guide the update process of model parameters:
Figure BDA0002672346120000128
wherein y ∈ {0,1} is a true value representing whether the user clicked on the target short video. σ is a sigmoid function.
Meanwhile, in order to ensure that the emotion capsule network can correctly capture fine-grained emotion, an edge loss function (margin loss) is added as a regular term:
Figure BDA0002672346120000129
wherein ,
Figure BDA00026723461200001210
representing all in the data set<User, short video>In pairs, and according to the experimental results, we set the parameters to e 0.8 and λ 0.5. It is to be noted that when the true value y is 1, v iss=vpos(ii) a Otherwise, vs=vneg
Figure BDA00026723461200001211
Representing the negative emotion of the emotion capsule s.
Further, the method also introduces inconsistency regularization (disagreement) to make M module vectors q tend to be inconsistent, and extracts different module features from the short video as much as possible:
Figure BDA00026723461200001212
and finally, combining the cross entropy loss function and two regular terms to obtain a complete loss function as follows:
Figure BDA0002672346120000131
wherein in our experiments λs0.1 and λa0.1. We update the model parameters using Adam optimizer.
The foregoing description of the embodiments is provided to facilitate understanding and application of the invention by those skilled in the art. It will be readily apparent to those skilled in the art that various modifications to the above-described embodiments may be made, and the generic principles defined herein may be applied to other embodiments without the use of inventive faculty. Therefore, the present invention is not limited to the above embodiments, and those skilled in the art should make improvements and modifications to the present invention based on the disclosure of the present invention within the protection scope of the present invention.

Claims (9)

1. A short video click rate prediction method based on an emotion capsule network is characterized by comprising the following steps:
s1, dividing the user behavior sequence into block sequences;
s2, extracting module characteristics from the user block sequence and the target short video by adopting a door mechanism, wherein the module characteristics are the user block sequence module characteristics and the target short video module characteristics respectively;
s3, obtaining emotion representation of the user on the target short video by using an emotion capsule network, extracting emotion characteristics from module characteristics of a lower layer through the capsule network, and analyzing emotion of the user on different modules of the short video to predict click rate, wherein the emotion is divided into positive, negative and neutral;
the data of the positive emotion is derived from a positive feedback sequence of the user, the data of the negative emotion is derived from a negative feedback sequence of the user, and the data of the neutral emotion is derived from a common part of the positive feedback sequence and the negative feedback sequence;
s4, predicting the click rate of the user on the target short video according to the emotional characteristics;
s5, designing a loss function according to the model characteristics;
and S6, updating the model parameters by adopting an Adam optimizer.
2. The method for predicting short video click rate based on emotion capsule network as recited in claim 1, wherein said step S1 further comprises:
s11, using window with length w to sequence click action of a user
Figure FDA0002672346110000011
Dividing the block into m blocks, and calculating the characteristic representation of each block in the following way:
Figure FDA0002672346110000012
wherein ,
Figure FDA0002672346110000013
preference characteristics for the user at the kth block;
s12, processing the negative feedback information of the user in the same way to obtain the negative feedback block representation
Figure FDA0002672346110000014
3. The method for predicting short video click rate based on emotion capsule network as claimed in claim 1, wherein the calculation method of step S2 is as follows:
Figure FDA0002672346110000015
wherein ,
Figure FDA0002672346110000021
represents the ith module characteristic of the kth block, Wi,1 and Wi,2Is the transition matrix for the ith module,
Figure FDA0002672346110000022
is the offset vector for the ith module, σ is the sigmoid activation function, as an element-level multiplication,
Figure FDA0002672346110000023
is a preference feature of the kth block in the click sequence, qiIs a characterization of the ith module and qiShared for all users.
4. The method for predicting short video click rate based on emotion capsule network as recited in claim 3, wherein said step S2 further comprises:
s21, after obtaining each module vector representation of each block, adopting an average pool to aggregate the same module information in all blocks:
Figure FDA0002672346110000024
where M is the number of blocks, the formula derives M module characteristics from the positive feedback sequence
Figure FDA0002672346110000025
S22, obtaining M module characteristics from negative feedback sequence by the same method
Figure FDA0002672346110000026
And obtaining M module characteristics from the target short video
Figure 1
5. The method for predicting short video click rate based on emotion capsule network as recited in claim 1, wherein said step S3 further comprises:
s31, matching the module features extracted from the user sequence with the module features of the target short video one by one to form an activation unit:
Figure FDA0002672346110000028
wherein ,
Figure FDA0002672346110000029
is the ith module feature of the target short video,
Figure FDA00026723461100000210
is the ith module feature of the user positive feedback sequence, isMultiplication of element dimensions, g being an activation function;
s32, obtaining the activation unit of negative feedback sequence by the same method
Figure FDA00026723461100000211
S33, extracting emotion characteristics from the activation units extracted by positive feedback by adopting an emotion capsule network:
Figure FDA00026723461100000212
Figure FDA00026723461100000213
where s is in the range of { pos, neu },
Figure FDA0002672346110000031
is the transformation matrix from the ith activation unit of the positive feedback sequence to the emotion capsule s; positive emotion capsule vposBy passing
Figure FDA0002672346110000032
The weighted sum of (a);
Figure FDA0002672346110000033
is a connection coefficient, represents
Figure FDA0002672346110000034
The weight of (a) is determined,
Figure FDA0002672346110000035
the parameters are updated by adopting a dynamic routing algorithm;
the activation function g is a vector activation function commonly used in capsule networks:
Figure FDA0002672346110000036
wherein, | | · | | represents the length of the vector;
s34, extracting emotion characteristics from the activation units extracted by negative feedback by adopting an emotion capsule network:
negative emotion capsule vnegEqual to:
Figure FDA0002672346110000037
Figure FDA0002672346110000038
wherein s belongs to { neg, neu };
s35, extracting neutral emotion capsules from the positive feedback and negative feedback sequences:
Figure FDA0002672346110000039
6. the method for predicting short video click rate based on emotion capsule network as recited in claim 5, wherein said step S3 further comprises improving connection coefficient
Figure FDA00026723461100000310
The method comprises the following steps:
s301, increasing the temperature coefficient, and improving the dynamic routing coefficient by increasing the temperature coefficient
Figure FDA00026723461100000311
The formula is as follows:
Figure FDA00026723461100000312
wherein p ∈ { +, - } and s ∈ { pos, neg, neu };
Figure FDA00026723461100000313
is the connection coefficient of the input capsule i to the output capsule s and is initialized to 0; τ is the temperature coefficient;
s302, correcting the dynamic routing coefficient according to the importance degree of different short video modules
Figure FDA00026723461100000314
The formula is as follows:
Figure FDA00026723461100000315
wherein ,
Figure FDA0002672346110000041
is the length of the activation unit, p ∈ { +, - } and s ∈ { pos, neg, neu }.
7. The method for predicting short video click rate based on emotion capsule network as claimed in claim 1, wherein said step S4 further comprises giving emotion capsule vsCalculating the probability of clicking the target short video by the user as follows:
Figure FDA0002672346110000042
Figure FDA0002672346110000043
wherein s belongs to { pos, neg, neu },
Figure FDA0002672346110000044
and
Figure FDA0002672346110000045
is a matrix of transitions that is,
Figure FDA0002672346110000046
is an offset vector, bs,2Is a bias scalar; σ is a sigmoid activation function, buIs the offset of the user dimension, | | vsAnd | is the length of the vector.
8. The method for predicting short video click rate based on emotion capsule network as recited in claim 1, wherein said step S5 comprises:
s51, predicting value of click rate of target short video through user
Figure FDA0002672346110000047
Calculating a predicted value
Figure FDA0002672346110000048
And the true value y to use the error to update the model parameters; a cross-entropy loss function (cross-entropy) is adopted to guide the updating process of the model parameters:
Figure FDA0002672346110000049
wherein y is an actual value and represents whether the user clicks the target short video or not, wherein y belongs to {0,1 }; σ is a sigmoid function;
s52, increasing the edge loss function LstmAnd inconsistent loss function LaspAs a regularization term, the loss function is:
Figure FDA00026723461100000410
wherein ,λs and λaAre respectively a loss function Lstm and LaspThe regularization parameter of (1).
9. The method for predicting short video click rate based on emotion capsule network as recited in claim 8, wherein in step S52, said edge loss function LstmThe calculation formula is as follows:
Figure FDA0002672346110000051
wherein O represents all of the data set<User, short video>And e and λ are model parameters; when the true value y is 1, vs=vpos(ii) a Otherwise, vs=vneg
Figure FDA0002672346110000053
Representing the negative emotion of the emotion capsule s;
the non-uniform regularization term LaspThe calculation formula is as follows:
Figure FDA0002672346110000052
where M is the number of module vectors q.
CN202010937121.4A 2020-09-08 2020-09-08 Short video click rate prediction method based on emotion capsule network Active CN112199550B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010937121.4A CN112199550B (en) 2020-09-08 2020-09-08 Short video click rate prediction method based on emotion capsule network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010937121.4A CN112199550B (en) 2020-09-08 2020-09-08 Short video click rate prediction method based on emotion capsule network

Publications (2)

Publication Number Publication Date
CN112199550A true CN112199550A (en) 2021-01-08
CN112199550B CN112199550B (en) 2023-05-19

Family

ID=74005990

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010937121.4A Active CN112199550B (en) 2020-09-08 2020-09-08 Short video click rate prediction method based on emotion capsule network

Country Status (1)

Country Link
CN (1) CN112199550B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112765461A (en) * 2021-01-12 2021-05-07 中国计量大学 Session recommendation method based on multi-interest capsule network
CN112905887A (en) * 2021-02-22 2021-06-04 中国计量大学 Conversation recommendation method based on multi-interest short-term priority model

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109189988A (en) * 2018-09-18 2019-01-11 北京邮电大学 A kind of video recommendation method
CN109948165A (en) * 2019-04-24 2019-06-28 吉林大学 Fine granularity feeling polarities prediction technique based on mixing attention network
CN111144130A (en) * 2019-12-26 2020-05-12 辽宁工程技术大学 Context-aware-based fine-grained emotion classification method for hybrid neural network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109189988A (en) * 2018-09-18 2019-01-11 北京邮电大学 A kind of video recommendation method
CN109948165A (en) * 2019-04-24 2019-06-28 吉林大学 Fine granularity feeling polarities prediction technique based on mixing attention network
CN111144130A (en) * 2019-12-26 2020-05-12 辽宁工程技术大学 Context-aware-based fine-grained emotion classification method for hybrid neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CHENLIANG LI: "A Capsule Network for Recommendation and Explaining What You Like and Dislike", 《2019 ASSOCIATION FOR COMPUTING MACHINERY》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112765461A (en) * 2021-01-12 2021-05-07 中国计量大学 Session recommendation method based on multi-interest capsule network
CN112905887A (en) * 2021-02-22 2021-06-04 中国计量大学 Conversation recommendation method based on multi-interest short-term priority model

Also Published As

Publication number Publication date
CN112199550B (en) 2023-05-19

Similar Documents

Publication Publication Date Title
CN112492396B (en) Short video click rate prediction method based on fine-grained multi-aspect analysis
CN108563755A (en) A kind of personalized recommendation system and method based on bidirectional circulating neural network
CN110381524B (en) Bi-LSTM-based large scene mobile flow online prediction method, system and storage medium
CN112395504B (en) Short video click rate prediction method based on sequence capsule network
CN112256916B (en) Short video click rate prediction method based on graph capsule network
CN112199550B (en) Short video click rate prediction method based on emotion capsule network
CN112307258B (en) Short video click rate prediction method based on double-layer capsule network
CN112765461A (en) Session recommendation method based on multi-interest capsule network
CN110889759A (en) Credit data determination method, device and storage medium
CN112256918B (en) Short video click rate prediction method based on multi-mode dynamic routing
CN116566842A (en) Centralized cloud edge cooperative wireless communication traffic prediction method
CN114282077A (en) Session recommendation method and system based on session data
Shu et al. Privileged multi-task learning for attribute-aware aesthetic assessment
CN113806633A (en) Digital business intelligent cross-domain recommendation method integrating similarity of user portrait and social relation
CN113297936A (en) Volleyball group behavior identification method based on local graph convolution network
CN113704439B (en) Conversation recommendation method based on multi-source information heteromorphic graph
CN112307257B (en) Short video click rate prediction method based on multi-information node graph network
CN112559905B (en) Conversation recommendation method based on dual-mode attention mechanism and social similarity
CN113051468B (en) Movie recommendation method and system based on knowledge graph and reinforcement learning
CN112765401B (en) Short video recommendation method based on non-local network and local network
CN112616072B (en) Short video click rate prediction method based on positive and negative feedback information of user
CN115545834B (en) Personalized service recommendation method based on graphic neural network and metadata
CN114036400B (en) Hypergraph-based collaborative session recommendation method
CN115470397B (en) Content recommendation method, device, computer equipment and storage medium
CN116489464B (en) Medical information recommendation method based on heterogeneous double-layer network in 5G application field

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant