CN112199550A

CN112199550A - Short video click rate prediction method based on emotion capsule network

Info

Publication number: CN112199550A
Application number: CN202010937121.4A
Authority: CN
Inventors: 吴健; 顾盼; 韩玉强; 高维
Original assignee: Shandong Industrial Technology Research Institute of ZJU
Current assignee: Shandong Industrial Technology Research Institute of ZJU
Priority date: 2020-09-08
Filing date: 2020-09-08
Publication date: 2021-01-08
Anticipated expiration: 2040-09-08
Also published as: CN112199550B

Abstract

The invention belongs to the technical field of internet services, and particularly relates to a short video click rate prediction method based on an emotion capsule network. A short video click rate prediction method based on an emotion capsule network comprises the following steps: s1, dividing the user behavior sequence into block sequences; s2, extracting module characteristics from the user block sequence and the target short video by adopting a door mechanism; s3, obtaining emotion representation of the target short video by the user by utilizing an emotion capsule network; s4, predicting the click rate of the user on the target short video according to the emotional characteristics; s5, designing a loss function according to the model characteristics; and S6, updating the model parameters by adopting an Adam optimizer. The invention provides a short video click rate prediction method based on an emotion capsule network, which is used for judging the emotion of a user to each module of a target short video and predicting the click rate of the user to the current short video in a fine-grained manner by using positive feedback and negative feedback data of the user in a short video platform.

Description

Short video click rate prediction method based on emotion capsule network

Technical Field

The invention belongs to the technical field of internet services, and particularly relates to a short video click rate prediction method based on an emotion capsule network.

Background

Short video is a new type of video with a short time. The shooting of the short video does not need to use professional equipment and professional skills. The user can conveniently shoot and upload to the short video platform directly through the mobile phone, so that the short video frequency quantity of the short video platform is increased very quickly. The requirement on the effective short video recommendation system is very urgent, and the effective short video recommendation system can improve the user experience and the user viscosity, so that huge commercial value is brought to the platform.

In recent years, many researchers have proposed personalized recommendation methods based on videos. These methods can be divided into three categories: collaborative filtering, content-based recommendations, and hybrid recommendation methods. But short video has different characteristics compared to video: the descriptive text is of low quality, short duration and the user has a long sequence of interactions over a period of time. Therefore, short video recommendations are a more challenging task and some approaches have been proposed by researchers. For example, Wei et al adopt a graph convolution structure to fuse multi-modal information of short videos, thereby better simulating the preference of users; chen et al uses a hierarchical attention mechanism to calculate the importance of both the item and category levels to obtain more accurate predictions.

Although these approaches achieve good results, these architectures only exploit the positive feedback information of the user and ignore the negative feedback information. The negative feedback information refers to the behavior that the user looks at the cover of the short video but does not click to watch. In recent research results, Li et al combine positive and negative feedback data and use a graph-based recurrent neural network to model, and finally obtain the user preference. But they calculate the user's preference for the short video as a whole, in fact, there are many modules of the video and the user has different emotions and preferences for different modules of the video. And they directly weight and sum the prediction results of positive feedback and negative feedback without fine-grained discussion of the specific roles of positive feedback and negative feedback.

Disclosure of Invention

The invention aims to solve the technical problem of providing a short video click rate prediction method based on an emotion capsule network, which utilizes positive feedback and negative feedback data of a user in a short video platform to judge the emotion of the user on each module of a target short video and predict the click rate of the user on the current short video in a fine-grained manner. Therefore, the invention adopts the following technical scheme:

a short video click rate prediction method based on an emotion capsule network comprises the following steps:

s1, dividing the user behavior sequence into block sequences;

s2, extracting module characteristics from the user block sequence and the target short video by adopting a door mechanism, wherein the module characteristics are the user block sequence module characteristics and the target short video module characteristics respectively;

s3, obtaining emotion representation of the user on the target short video by using an emotion capsule network, extracting emotion characteristics from module characteristics of a lower layer through the capsule network, and analyzing emotion of the user on different modules of the short video to predict click rate, wherein the emotion is divided into positive, negative and neutral;

the data of the positive emotion is derived from a positive feedback sequence of the user, the data of the negative emotion is derived from a negative feedback sequence of the user, and the data of the neutral emotion is derived from a common part of the positive feedback sequence and the negative feedback sequence;

s4, predicting the click rate of the user on the target short video according to the emotional characteristics;

s5, designing a loss function according to the model characteristics;

and S6, updating the model parameters by adopting an Adam optimizer.

Wherein, the short video is composed of finer grained modules (e.g., video scene, video theme, video mood); the emotions are divided into positive (positive), negative (negative) and neutral (neutral), and are demonstrated in the following formulas with the first 3 letters of English for the three emotions, respectively.

Furthermore, the data of positive emotion comes from a positive feedback sequence of the user, the data of negative emotion comes from a negative feedback sequence of the user, and the data of neutral emotion comes from a common part of the positive feedback sequence and the negative feedback sequence.

On the basis of the technical scheme, the invention can also adopt the following further technical scheme:

the step S1 further includes:

s11, using window with length w to sequence click action of a user

Dividing the block into m blocks, and calculating the characteristic representation of each block in the following way:

wherein ,

preference characteristics for the user at the kth block;

s12, processing the negative feedback information of the user in the same way to obtain the negative feedback block representation

The calculation method of step S2 is as follows:

wherein ,

represents the ith module characteristic of the kth block, W_i,1 and W_i,2Is the transition matrix for the ith module,

is the offset vector for the ith module, σ is the sigmoid activation function, as an element-level multiplication,

is a preference feature of the kth block in the click sequence, q_iIs a characterization of the ith module and q_iShared for all users. Wherein, the number M of modules of the short video is a hyper-parameter.

The step S2 further includes:

s21, after obtaining each module vector representation of each block, adopting an average pool to aggregate the same module information in all blocks:

where M is the number of blocks, the formula derives M module characteristics from the positive feedback sequence

S22, obtaining M module characteristics from negative feedback sequence by the same method

And obtaining M module characteristics from the target short video

The method extracts positive emotions from click sequences and negative emotions from non-click sequences. Further, some module features often appear in both click and no-click sequences of a user, and the user then has a neutral feeling with respect to such modules. Therefore, the step S3 further includes:

s31, matching the module features extracted from the user sequence with the module features of the target short video one by one to form an activation unit:

wherein ,

is the ith module feature of the target short video,

is the ith module feature of the user positive feedback sequence, is the multiplication of the element dimension, g is the activation function;

s32, obtaining the activation unit of negative feedback sequence by the same method

S33, extracting emotion characteristics from the activation units extracted by positive feedback by adopting an emotion capsule network:

where s is in the range of { pos, neu },

is the transformation matrix from the ith activation unit of the positive feedback sequence to the emotion capsule s; positive emotion capsule v_posBy passing

Obtaining a weighted sum of;

is a connection coefficient, represents

The weight of (a) is determined,

the parameters are updated by adopting a dynamic routing algorithm;

g is a vector activation function (square activation) commonly used in capsule networks, and the formula is as follows:

wherein, | | · | | represents the length of the vector;

s34, extracting emotion characteristics from the activation units extracted by negative feedback by adopting an emotion capsule network:

negative emotion capsule v_negEqual to:

wherein s belongs to { neg, neu };

s35, extracting neutral emotion capsules from the positive feedback and negative feedback sequences:

further, the connection coefficient is improved

The updating method comprises the following steps:

s301, increasing the temperature coefficient, and improving the dynamic routing coefficient by increasing the temperature coefficient

The formula is as follows:

wherein p ∈ { +, - } and s ∈ { pos, neg, neu };

is the connection coefficient of the input capsule i to the output capsule s and is initialized to 0; τ is the temperature coefficient; when tau → 0⁺Output emotion capsules tend to focus on only one input capsule; while when τ → ∞, the effect of the input capsule on the output emotion capsule tends to be consistent.

S302, considering the importance degree of different modules of the short video, the length of the activation unit can explain the importance degree of the modules. Therefore, the dynamic routing coefficient is corrected according to the importance degree of different short video modules

The formula is as follows:

wherein ,

is the length of the activation unit, which can account for the importance of the module, p ∈ { +, - } and s ∈ { pos, neg, neu }.

The step S4 further comprises giving an emotion capsule v_sAnd calculating the probability of the user clicking the target short video:

wherein s belongs to { pos, neg, neu },

and

is a matrix of transitions that is,

is an offset vector, b_s,2Is a bias scalar; σ is a sigmoid activation function, b_uIs the offset of the user dimension, | | v_sAnd | is the length of the vector, representing the confidence of the emotion.

The step S5 includes:

s51, predicting value of click rate of target short video through user

Calculating a predicted value

And the true value y to use the error to update the model parameters; a cross-entropy loss function (cross-entropy loss) is adopted to guide the updating process of the model parameters:

wherein y is an actual value and represents whether the user clicks the target short video or not, wherein y belongs to {0,1 }; σ is a sigmoid function;

s52, in order to ensure that the emotion capsule network can correctly capture fine-grained emotion, an edge loss function L is added_stmAnd inconsistent loss function L_aspAs a regular term, the cross entropy loss function is combined with two regular terms to obtain a complete loss function:

the edge loss function L_stmThe calculation formula is as follows:

wherein ,

representing all in the data set<User, short video>Carrying out pairing; when the true value y is 1, v_s＝v_pos(ii) a Otherwise, v_s＝v_neg；

Representing the negative emotion of the emotion capsule s;

inconsistent regularization (disagreement) makes M module vectors q tend to be inconsistent, different module features are extracted from a short video as much as possible, and an inconsistent regularization term L_aspThe calculation formula is as follows:

where M is the number of module vectors q.

The invention has the following beneficial technical effects:

(1) the invention provides a short video click rate prediction method based on an emotion capsule network.

(2) The invention provides a novel emotion capsule network framework, which extracts positive, negative and neutral emotions of a user from different modules of a short video sequence of the user. And improvements are made to the framework and dynamic routing algorithms of the capsule network.

(3) A large number of experiments are carried out based on two pieces of short video data, and the result proves that the method can obtain better effect in the same data than the latest method.

Drawings

FIG. 1 is a schematic flow chart of a short video click rate prediction method based on an emotion capsule network according to the present invention;

FIG. 2 is a model framework diagram of a short video click rate prediction method based on an emotion capsule network according to the present invention;

FIG. 3 is a user positive feedback and negative feedback information diagram of the short video click rate prediction method based on the emotion capsule network.

Detailed Description

For further understanding of the present invention, the short video click rate prediction method based on emotion capsule network provided by the present invention is described in detail below with reference to specific embodiments, but the present invention is not limited thereto, and those skilled in the art can make insubstantial improvements and modifications within the spirit of the present invention, and still fall within the scope of the present invention.

The short video click rate prediction task is to establish a model to predict the probability of the user clicking on the short video.

The history sequence of the user is represented as

Where p ∈ { +, - } represents click and no-click behavior, respectively, x_jRepresenting the jth short video, l is the length of the sequence. The entire sequence may be further subdivided into click sequences

And non-click sequences

Namely positive feedback and negative feedback information. Thus, the short video click-through rate prediction problem can be expressed as: entering user click sequences

Non-clicked sequence

And target short video x_newTo predict the userFor target short video x_newThe click rate of (c).

Therefore, the invention provides a short video click rate prediction method based on an emotion capsule network. According to the method, the emotion of the user on different modules of the short video is mined according to the positive feedback and negative feedback information of the user on the short video, and the click rate of the user on the target short video is predicted. The positive feedback here means that the user clicks the short video; negative feedback means that the platform shows the cover of the short video, but the user does not click on the short video, which indicates that the user has no interest in the short video. The method considers that the user has different feelings (sentiments) on different modules (aspect) of the short video, predicts the click rate of the user on the target short video, and analyzes the preference of the user on different modules of the short video. As shown in fig. 3, a short video has three modules: video scenes, video themes, video emotions. From the user's click and no click behavior, we can find that the user likes a beauty-related video theme as well as positive video emotions, but does not like an animal-related theme as well as negative video emotions. Further, the user maintains a neutral attitude with respect to the video scene.

The method consists essentially of three parts, as shown in FIG. 2. The first part is to adopt a door mechanism to extract module features (aspect features) from positive feedback and negative feedback information of a user, and simultaneously adopt the same method to extract the module features from a target short video. And the second part is to pair the two module characteristics one by one to form an activation unit, input the activation unit into the emotion capsule network and predict the emotion of the user on different modules of the short video. And the third part is used for predicting the short video click rate based on the emotional characteristics of the user on the current short video.

As shown in fig. 1, according to one embodiment of the present invention, the method comprises the steps of:

and S100, dividing the user behavior sequence into block (block) sequences. Click behavior sequence for a user

Can be expressed as

wherein

Is the feature vector of the cover picture of the short video, and d is the feature vector length. The short video has a short duration, which results in a long sequence of user actions. Therefore, the method uses a window of length w to divide the sequence X⁺The short video frequency of the interaction of the user in one block is similar.

At this time, the calculation manner of each block feature characterization is as follows:

wherein ,

the user's preference characteristics at the kth block. Although the way of summing may lose a part of the information, the main information is retained, which is what we need. The method adopts the same mode to process the negative feedback information of the user to obtain the negative feedback block representation

S200, extracting module characteristics from the user block sequence and the target short video, wherein the short video is composed of modules with finer granularity (such as video scenes, video themes and video emotions).

The method adopts the characteristics of a door mechanism extraction module, and the following formula is an ith module for extracting a kth block:

wherein ,W_i,1 and W_i,2Is the transition matrix for the ith module,

is the offset vector for the ith module. σ is a sigmoid activation function, which is an element-level multiplication.

Is a preference feature of the kth block in the click sequence, q_iIs a characterization of the ith module and q_iShared for all users. The number M of modules of the short video is a hyper-parameter, and the number is set to 5 through experimental verification in the method. After each module vector representation of each block is obtained, the patent adopts an average pool (averaging pool) to aggregate the same module information in all blocks:

where m is the number of blocks. Finally, we can obtain M from the positive feedback sequence

And (5) module characteristics. By the same method, M module characteristics can be obtained from the negative feedback sequence

And M module characteristics can be obtained from the target short video

And S300, obtaining the emotion representation of the user on the target short video by utilizing the emotion capsule network. For a target short video, the method predicts the click rate by analyzing the emotion of the user on different modules of the short video. The method adopts a capsule network to extract the emotional characteristics from the module characteristics of the lower layer. There are three emotions, which are: positive (positive), negative (negative) and neutral (neutral). The method extracts positive emotions from click sequences and negative emotions from non-click sequences. Further, some module features often appear in both click and no-click sequences of a user, and the user then has a neutral feeling with respect to such modules. This type of emotion also plays a role in prediction, especially when negative and positive emotions are not apparent.

Firstly, matching the module characteristics extracted from the user sequence and the module characteristics of the target short video one by one to form an activation unit:

wherein ,

is the ith module feature of the target short video,

is the ith module feature of the user positive feedback sequence, which is a multiplication of the element dimension (element-wise). g is a vector activation function (square) commonly used in capsule networks, and the formula is as follows:

where | l | · | |, represents the length of the vector. In the same way, an activation unit with negative feedback sequence can be obtained

Then, extracting emotional characteristics from the active units extracted by positive feedback and negative feedback by adopting a capsule network:

where s is in the range of { pos, neu },

is the conversion matrix from the ith activation unit of the positive feedback sequence to the emotion capsule s, and as can be seen from the formula, positive and neutral emotion are obtained from the positive feedback sequence. Positive emotion capsule v_posBy passing

Is obtained.

Also, g here is the activation function (square) commonly used in capsule networks, and

is a connection coefficient, represents

The weight of (a) is determined,

the parameters are updated using a dynamic routing algorithm.

Likewise, negative emotion capsule v_negEqual to:

where s is ∈ { neg, neu }.

And the neutral emotion capsule is extracted from a positive feedback sequence and a negative feedback sequence:

thus, data for positive emotions are derived from the positive feedback sequence of the user, data for negative emotions are derived from the negative feedback sequence of the user, and data for neutral emotions are derived from a common portion of the positive feedback sequence and the negative feedback sequence. This structure is different from the full connectivity of the traditional capsule network and is an innovation of the method. To increase the distinctiveness between affective capsules, we further improve

The update method of (1). The modification is carried out by two points, wherein the first point is to increase the temperature coefficient:

wherein p ∈ { +, - } and s ∈ { pos, neg, neu }.

Is the connection coefficient of the input capsule i to the output capsule s and is initialized to 0.τ is the temperature coefficient, and experiments show that τ of 0.8 is the best in the data. When tau → 0⁺Output emotion capsules tend to focus on only one input capsule; while when τ → ∞, the effect of the input capsule on the output emotion capsule tends to be consistent.

The second point is how important the modules can be accounted for by the length of the activation unit, taking into account the importance of the different modules of the short video. We use the length of the active cell to correct

wherein ,

is the length of the activation unit, p ∈ { +, - } and s ∈ { pos, neg, neu }.

And S400, predicting the click rate of the user on the target short video according to the emotional characteristics. Given emotion capsule v_sCalculating the probability of clicking the target short video by the user as follows:

wherein s belongs to { pos, neg, neu },

and

is a matrix of transitions that is,

is an offset vector, b_s,2Is a bias scalar. σ is a sigmoid activation function, b_uIs the offset of the user dimension, | | v_sAnd | is the length of the vector, representing the confidence of the emotion.

And S500, designing a loss function according to the model characteristics. Predicting value of click rate of target short video through user

Calculating a predicted value

And the true value y, and the error is used to update the model parameters. We use cross-entropy loss function (cross-entropy loss) to guide the update process of model parameters:

wherein y ∈ {0,1} is a true value representing whether the user clicked on the target short video. σ is a sigmoid function.

Meanwhile, in order to ensure that the emotion capsule network can correctly capture fine-grained emotion, an edge loss function (margin loss) is added as a regular term:

wherein ,

representing all in the data set<User, short video>In pairs, and according to the experimental results, we set the parameters to e 0.8 and λ 0.5. It is to be noted that when the true value y is 1, v is_s＝v_pos(ii) a Otherwise, v_s＝v_neg。

Representing the negative emotion of the emotion capsule s.

Further, the method also introduces inconsistency regularization (disagreement) to make M module vectors q tend to be inconsistent, and extracts different module features from the short video as much as possible:

and finally, combining the cross entropy loss function and two regular terms to obtain a complete loss function as follows:

wherein in our experiments λ_s0.1 and λ_a0.1. We update the model parameters using Adam optimizer.

The foregoing description of the embodiments is provided to facilitate understanding and application of the invention by those skilled in the art. It will be readily apparent to those skilled in the art that various modifications to the above-described embodiments may be made, and the generic principles defined herein may be applied to other embodiments without the use of inventive faculty. Therefore, the present invention is not limited to the above embodiments, and those skilled in the art should make improvements and modifications to the present invention based on the disclosure of the present invention within the protection scope of the present invention.

Claims

1. A short video click rate prediction method based on an emotion capsule network is characterized by comprising the following steps:

s1, dividing the user behavior sequence into block sequences;

s5, designing a loss function according to the model characteristics;

and S6, updating the model parameters by adopting an Adam optimizer.

2. The method for predicting short video click rate based on emotion capsule network as recited in claim 1, wherein said step S1 further comprises:

s11, using window with length w to sequence click action of a user

wherein ,

preference characteristics for the user at the kth block;

3. The method for predicting short video click rate based on emotion capsule network as claimed in claim 1, wherein the calculation method of step S2 is as follows:

wherein ,

represents the ith module characteristic of the kth block, W_i，1 and W_i，2Is the transition matrix for the ith module,

is a preference feature of the kth block in the click sequence, q_iIs a characterization of the ith module and q_iShared for all users.

4. The method for predicting short video click rate based on emotion capsule network as recited in claim 3, wherein said step S2 further comprises:

And obtaining M module characteristics from the target short video

5. The method for predicting short video click rate based on emotion capsule network as recited in claim 1, wherein said step S3 further comprises:

wherein ,

is the ith module feature of the target short video,

is the ith module feature of the user positive feedback sequence, isMultiplication of element dimensions, g being an activation function;

where s is in the range of { pos, neu },

The weighted sum of (a);

is a connection coefficient, represents

The weight of (a) is determined,

the parameters are updated by adopting a dynamic routing algorithm;

the activation function g is a vector activation function commonly used in capsule networks:

wherein, | | · | | represents the length of the vector;

negative emotion capsule v_negEqual to:

wherein s belongs to { neg, neu };

6. the method for predicting short video click rate based on emotion capsule network as recited in claim 5, wherein said step S3 further comprises improving connection coefficient

The method comprises the following steps:

The formula is as follows:

wherein p ∈ { +, - } and s ∈ { pos, neg, neu };

is the connection coefficient of the input capsule i to the output capsule s and is initialized to 0; τ is the temperature coefficient;

s302, correcting the dynamic routing coefficient according to the importance degree of different short video modules

The formula is as follows:

wherein ,

7. The method for predicting short video click rate based on emotion capsule network as claimed in claim 1, wherein said step S4 further comprises giving emotion capsule v_sCalculating the probability of clicking the target short video by the user as follows:

wherein s belongs to { pos, neg, neu },

and

is a matrix of transitions that is,

is an offset vector, b_s，2Is a bias scalar; σ is a sigmoid activation function, b_uIs the offset of the user dimension, | | v_sAnd | is the length of the vector.

8. The method for predicting short video click rate based on emotion capsule network as recited in claim 1, wherein said step S5 comprises:

s51, predicting value of click rate of target short video through user

Calculating a predicted value

And the true value y to use the error to update the model parameters; a cross-entropy loss function (cross-entropy) is adopted to guide the updating process of the model parameters:

s52, increasing the edge loss function L_stmAnd inconsistent loss function L_aspAs a regularization term, the loss function is:

wherein ,λ_s and λ_aAre respectively a loss function L_stm and L_aspThe regularization parameter of (1).

9. The method for predicting short video click rate based on emotion capsule network as recited in claim 8, wherein in step S52, said edge loss function L_stmThe calculation formula is as follows:

wherein O represents all of the data set<User, short video>And e and λ are model parameters; when the true value y is 1, v_s＝v_pos(ii) a Otherwise, v_s＝v_neg；

Representing the negative emotion of the emotion capsule s;

the non-uniform regularization term L_aspThe calculation formula is as follows:

where M is the number of module vectors q.