CN112765401B

CN112765401B - Short video recommendation method based on non-local network and local network

Info

Publication number: CN112765401B
Application number: CN202110034609.0A
Authority: CN
Inventors: 顾盼
Original assignee: China Jiliang University
Current assignee: Beijing Xindong Internet Information Technology Co.,Ltd.; Guangzhou Runxin Intellectual Property Operation Co ltd
Priority date: 2021-01-12
Filing date: 2021-01-12
Publication date: 2021-11-12
Anticipated expiration: 2041-01-12
Also published as: CN112765401A

Abstract

The invention discloses a short video recommendation method based on a non-local network and a local network. According to the method, user interest representation is obtained according to a multi-behavior interaction sequence of a user on the short video, and then the click rate of the user on the target short video is predicted. The original sequence recommendation method aims at single-line sequence and cannot be applied to multi-line interactive sequence characterization. While the short video interaction sequence of the user includes both "click" and "affirmative" behavior. The method therefore proposes a method combining a non-local network and a local network. Mainly consists of three parts: the first part is to adopt an attention mechanism method to obtain the influence of 'positive' behaviors in a short video interaction sequence of a user on each 'click' behavior; the second part is to adopt a recurrent neural network based on a non-local attention mechanism to generate a user interest representation; and the third part is that the click rate of the user on the target short video is predicted according to the user interest representation.

Description

Short video recommendation method based on non-local network and local network

Technical Field

The invention belongs to the technical field of internet service, and particularly relates to a short video recommendation method based on a non-local network and a local network.

Background

Short video is a new type of video with a short time. The shooting of the short video does not need to use professional equipment and professional skills. The user can conveniently shoot and upload to the short video platform directly through the mobile phone, so that the short video frequency quantity of the short video platform is increased very quickly. The requirement on the effective short video recommendation system is very urgent, and the effective short video recommendation system can improve the user experience and the user viscosity, so that huge commercial value is brought to the platform.

In recent years, many researchers have proposed personalized recommendation methods based on videos. These methods can be divided into three categories: collaborative filtering, content-based recommendations, and hybrid recommendation methods. But short video has different characteristics compared to video: the descriptive text is of low quality, short duration and the user has a long sequence of interactions over a period of time. Therefore, short video recommendations are a more challenging task. And there are many kinds of user's interactive behaviors including "click", "like" and "favorite" in the short video recommendation problem. Different interactive behaviors represent different likelihoods. "click" indicates that the user would like to watch the short video, but the emotion is not strong; the 'like' and 'favorite' belong to the strong and definite emotions of the user, the 'like' indicates that the user likes the short video and is willing to watch the video of the same type, and the 'favorite' indicates that the user not only likes the short video currently but also wants to watch the video later. Short videos that are "liked" and "favorite" by the user are also "clicked" by the user at the same time. "like" and "favorite" can be generalized to the same type of behavior, i.e., "positive" behavior. At this time, the interaction sequence of the user has two kinds of interaction behaviors, one is a "click" behavior, and the other is an "affirmative" behavior. Some methods have been proposed by researchers to address the short video recommendation problem. For example, Chen et al use a hierarchical attention mechanism to calculate the importance of both the item and category levels to obtain more accurate predictions. Li et al used a graph-based recurrent neural network to model and finally get the user's preferences.

The Chen et al method only uses the click behavior information of the user and does not consider other behavior information of the user. Li et al apply the sequence recommendation method to "click" and "affirmative" behavior sequences, respectively. Finally, experiments show that the user interest representation based on the 'positive' behavior sequence has no obvious effect on improving the recommendation effect. The reasons are two: the time interval of the 'positive' action sequence of the user is longer, and the sequence is not strong; the "positive" behavior sequence is modeled separately, ignoring the effect of the "positive" behavior on subsequent "click" behavior. The method creatively provides a multi-behavior interaction sequence modeling method, two behavior sequences of click and affirmation are put into one behavior sequence to be processed, and the user interest vector representation is generated. Where the "click" behavior is sequential, and the "affirmative" behavior is unordered because the behavior events are more spaced apart. The method combines a non-local network (non-local network) and a local network (local network), wherein the non-local network adopts an attention mechanism (attention mechanism) and learns the influence of 'positive' behaviors on 'click' behaviors in the past period of time; the local network adopts a gated recurrent neural network (GRU) to learn the sequentiality of the click behavior. The method is a recurrent neural network based on a non-local attention mechanism, and improves the structure of an original neural network, so that the network can simultaneously learn the influence of 'positive' behaviors on 'clicking' behaviors and the influence of 'clicking' behaviors on 'clicking' behaviors.

Disclosure of Invention

The technical problem to be solved by the invention is to predict the click rate of a user on a target short video according to a multi-behavior click sequence of the user on the short video. There are many kinds of user interaction behaviors including "click", "like", and "favorite". Different interactive behaviors represent different likelihoods. "click" indicates that the user would like to watch the short video, but the emotion is not strong; the 'like' and 'favorite' both belong to the strong and definite emotions of the user, the 'like' means that the user likes the short video and is willing to watch the video of the same type, and the 'favorite' means that the user not only likes the short video currently but also wants to watch the video later. Short videos that are "liked" and "favorite" by the user are also "clicked" by the user at the same time. "like" and "favorite" can be generalized to the same type of behavior, i.e., "positive" behavior. At this time, the interaction sequence of the user has two kinds of interaction behaviors, one is a "click" behavior, and the other is an "affirmative" behavior. However, the original sequence recommendation methods are all directed to a sequence of interactive actions. Therefore, the invention adopts the following technical scheme:

a short video recommendation method based on a non-local network and a local network comprises the following steps:

and (3) obtaining the influence of the 'positive' behavior on each 'click' behavior in the short video multi-behavior interaction sequence of the user by adopting an attention mechanism method. Sequence of interactive actions for a user

Can be represented as X ═ X₁,…,x_l]Wherein

Is the feature vector of the cover picture of the short video, and d is the feature vector length. Wherein a sequence of "positive" behaviors is represented as

And X^*Is a subset of X. "click" behavior sequenceI.e. X ═ X₁,…,x_l]. The influence of the 'positive' behavior sequence on the 'click' behavior is obtained by using an attention mechanism method in a non-local network method. Typically, the last-click short video (last-click) in the sequence is used to represent the user's current click interest, so the attention mechanism is based on the last-click short video:

wherein the content of the first and second substances,

and

are parameters that the model needs to be trained. x is the number of_tRepresenting the last short video vector representation in the click sequence,

the ith short video vector representation representing the "positive" sequence in the current "click" sequence. Sigma is sigmoid function.

Ith short video vector characterization representing a "positive" sequence in the current "click" sequence

The degree of importance of.

Is x_tThe impact of "positive" behavior in the ending "click" behavior sequence on the current click interest.

A user interest characterization is generated using a recurrent neural network based on a non-local attention mechanism. The original gated recurrent neural network (GRU) can only handle single-action sequences, with the structure:

z_t＝σ(W_xz·x_t+W_hz·h_t-1)

r_t＝σ(W_xr·x_t+W_hr·h_t-1)

wherein r is_tIs a reset gate, z_tTo update the gates (update gate), these two gating vectors determine which information can be used as the output of the gated loop unit.

Is the current memory content. x is the number of_tIs the node input for the current layer.

And

respectively, control the update gate z_tAnd a reset gate r_tThe parameter (c) of (c).

And

is to control the pre-memory content

The parameter (c) of (c). AnMatrix multiplication at the element level, σ is the sigmoid function.

Gated recurrent neural networks, however, do not apply to multi-behavior sequences. In order to be suitable for a multi-behavior sequence, the method improves the original gated recurrent neural network, so that the gated recurrent neural network unit (unit) selects information not only considering the current short video in the sequence and the state of the last gated recurrent neural network unit, but also considering the influence of "positive" behavior, as follows:

wherein z is_tTo update the gate (update gate), r_tIs a reset gate (reset gate) and these two gating vectors determine which information can be used as the output of the gated loop unit.

Is the current memory content. x is the number of_tIs the node input at the current level,

is the effect of "positive" behavior.

And

And

is to control the pre-memory content

The parameter (c) of (c). As is the element-level matrix multiplication, σ is the sigmoid function. Hidden state h of last layer of gate control recurrent neural network_tThe output of (a) is the user interest representation v.

Predicting the target short video x of the user according to the user interest representation_newClick rate of (2):

where v is a user interest representation, x_newIs the target short video.

And the predicted value of the click rate of the user on the target short video is shown.

And

is a matrix of transitions that is,

is an offset vector, b₂Is a bias scalar. σ is the sigmoid activation function.

And designing a loss function according to the model characteristics. Predicting value of click rate of target short video through user

Calculating a predicted value

And the true value y, and the error is used to update the model parameters. We use a cross-entropy loss function to guide the update process of model parameters:

wherein y ∈ {0,1} is a true value representing whether the user clicked on the target short video. σ is a sigmoid function. We update the model parameters using Adam optimizer.

The invention has the following beneficial technical effects:

(1) the invention discloses a multi-behavior sequence characterization method. Different from the previous single behavior sequence characterization method, the method puts the two behavior sequences of click and affirmation into one behavior sequence for processing to generate the user interest vector characterization. Where the "click" behavior is sequential, and the "affirmative" behavior is unordered because the behavior events are more spaced apart.

(2) The invention combines non-local network (non-local network) and local network (local network). Wherein, the non-local network adopts an attention mechanism (attention mechanism), and learns the influence of all 'positive' behaviors on 'click' behaviors in the past period of time; the local network adopts a gated recurrent neural network (GRU) to learn the influence of the click behavior on the click behavior in the near period of time.

(3) The invention relates to a recurrent neural network based on a non-local attention mechanism, which can enable the network to simultaneously learn the influence of 'positive' behavior on 'click' behavior and the influence of 'click' behavior on 'click' behavior by improving the structure of an original neural network.

Drawings

FIG. 1 is a schematic flow chart of a short video recommendation method based on a non-local network and a local network according to the present invention;

fig. 2 is a model framework diagram of a short video recommendation method based on a non-local network and a local network according to the present invention.

Detailed Description

For further understanding of the present invention, the following describes a short video recommendation method based on non-local network and local network in detail with reference to specific embodiments, but the present invention is not limited thereto, and those skilled in the art can make insubstantial improvements and modifications under the core teaching of the present invention, and still fall within the scope of the present invention.

The short video click rate prediction task is to establish a model to predict the probability of the user clicking on the short video. The historical interactive short video sequence of the user is represented as

Wherein x is_jRepresenting the jth short video, l is the length of the sequence. There are many kinds of user interaction behaviors including "click", "like", and "favorite". Different interactive behaviors represent different likelihoods. "click" indicates that the user would like to watch the short video, but the emotion is not strong; the 'like' and 'favorite' belong to the strong and definite emotions of the user, the 'like' indicates that the user likes the short video and is willing to watch the video of the same type, and the 'favorite' indicates that the user not only likes the short video currently but also wants to watch the video later. Short videos that are "liked" and "favorite" by the user are also "clicked" by the user at the same time. Thus, the short video click-through rate prediction problem can be expressed as: inputting user multi-behavior interaction sequences

And target short video x_newTo predict the user-to-target short video x_newThe click rate of (c).

Therefore, the invention provides a short video recommendation method based on a non-local network and a local network. The method predicts the click rate of the user on the target short video according to the multi-behavior interaction sequence of the user on the short video. The multiple behaviors here include "click," "like," and "favorite" behaviors of the user. In the method, "like" and "favorite" are summarized as the same type of behavior, i.e., "affirmative" behavior. At this time, the interaction sequence of the user has two kinds of interaction behaviors, one is a "click" behavior, and the other is an "affirmative" behavior. The original sequence recommendation method is directed to a sequence of interactive behaviors. Li et al applied the sequence recommendation method to the "click" behavior sequence and the "affirmative" behavior sequence, respectively, and finally experiments showed that the user interest characterization based on the "affirmative" behavior sequence had a very insignificant effect on improving the recommendation effect. The reasons are two: the time interval of the 'positive' action sequence of the user is longer, and the sequence is not strong; the "positive" behavior sequence is modeled separately, ignoring the effect of the "positive" behavior on subsequent "click" behavior. The method creatively provides a multi-behavior interaction sequence modeling method, two behavior sequences of click and affirmation are put into one behavior sequence to be processed, and the user interest vector representation is generated. Where the "click" behavior is sequential, and the "affirmative" behavior is unordered because the behavior events are more spaced apart. The method combines a non-local network (non-local network) and a local network (local network), wherein the non-local network adopts an attention mechanism (attention mechanism) and learns the influence of 'positive' behaviors on 'click' behaviors in the past period of time; the local network adopts a gated recurrent neural network (GRU) to learn the sequentiality of the click behavior. The method is a cyclic neural network based on a non-local attention mechanism, and the structure of the original neural network is improved, so that the network can learn the influence of the 'positive' behavior on the 'click' behavior and the influence of the 'click' behavior on the 'click' behavior at the same time.

The method consists essentially of three parts, as shown in FIG. 2. The first part is to adopt an attention mechanism method to obtain the influence of the 'positive' behavior on each 'click' behavior in the short video multi-behavior interaction sequence of the user. The second part is to generate a user interest characterization using a recurrent neural network based on a non-local attention mechanism. And the third part is that the click rate of the user on the target short video is predicted according to the user interest representation.

As shown in fig. 1, according to one embodiment of the present invention, the method comprises the steps of:

s100, obtaining the influence of the 'positive' behavior on each 'click' behavior in the short video multi-behavior interaction sequence of the user by adopting an attention mechanism method. Sequence of interactive actions for a user

Can be represented as X ═ X₁,…,x_l]Wherein

And X^*Is a subset of X. The sequence of "click" actions is X ═ X₁，…,x_l]. The influence of the 'positive' behavior sequence on the 'click' behavior is obtained by using an attention mechanism method in a non-local network method. Typically, the last-click short video (last-click) in the sequence is used to represent the user's current click interest, so the attention mechanism is based on the last-click short video:

wherein the content of the first and second substances,

and

The degree of importance of.

And S200, generating a user interest characterization by adopting a recurrent neural network based on a non-local attention mechanism. The original gated recurrent neural network (GRU) can only handle single-action sequences, with the structure:

z_t＝σ(W_xz·x_t+W_hz·h_t-1)

r_t＝σ(W_xr·x_t+W_hr·h_t-1)

wherein r is_tIs a reset gate, z_tTo update the gate (update gate),these two gating vectors determine which information can be used as the output of the gated loop unit.

And

And

is to control the pre-memory content

The parameter (c) of (c). As is the element-level matrix multiplication, σ is the sigmoid function.

Gated recurrent neural networks, however, do not apply to multi-behavior sequences. In order to be suitable for a multi-behavior sequence, the method improves an original gate control recurrent neural network, so that the selection of information by a gate control recurrent neural network unit (unit) not only considers the states of a short video and the last gate control recurrent neural network unit in the current sequence, but also considers the influence of 'positive' behavior, as follows:

is the effect of "positive" behavior.

And

And

is to control the pre-memory content

S300, according to the userInterest representation for predicting target short video x of user_newClick rate of (2):

where v is a user interest representation, x_newIs the target short video.

And

is a matrix of transitions that is,

And S400, designing a loss function according to the model characteristics. Predicting value of click rate of target short video through user

Calculating a predicted value

The foregoing description of the embodiments is provided to facilitate understanding and application of the invention by those skilled in the art. It will be readily apparent to those skilled in the art that various modifications to the above-described embodiments may be made, and the generic principles defined herein may be applied to other embodiments without the use of inventive faculty. Therefore, the present invention is not limited to the above embodiments, and those skilled in the art should make improvements and modifications to the present invention based on the disclosure of the present invention within the protection scope of the present invention.

Claims

1. A short video recommendation method based on a non-local network and a local network is characterized in that:

the method comprises the steps that an attention mechanism method is adopted, and the influence of 'positive' behaviors on each 'click' behavior in a short video multi-behavior interaction sequence of a user is obtained; sequence of interactive actions for a user

Can be represented as X ═ X₁,…,x_l]Wherein

Is the feature vector of the cover picture of the short video, d is the length of the feature vector; wherein a sequence of "positive" behaviors is represented as

And X^*Is a subset of X; the sequence of "click" actions is X ═ X₁,…,x_l](ii) a Obtaining the influence of the 'affirmation' behavior sequence on the 'click' behavior by using an attention mechanism method in a non-local network method; the last-click short video (last-click) in the sequence is used to represent the user's current click interest, so the attention mechanism is based on the last-click short video:

wherein the content of the first and second substances,

and

is the parameter that the model needs to be trained; x is the number of_tRepresenting the last short video vector representation in the click sequence,

the ith short video vector representation of the "positive" sequence in the current "click" sequence; sigma is sigmoid function;

The degree of importance of;

is x_tThe influence of the "positive" behavior in the ending "click" behavior sequence on the current click interest;

generating a user interest representation by adopting a recurrent neural network based on a non-local attention mechanism; the selection of information by a gated recurrent neural network element (unit) in a recurrent neural network based on a non-local attention mechanism takes into account not only the current short video in the sequence and the state of the last gated recurrent neural network element, but also the effect of "positive" behavior, as follows:

wherein z is_tTo update the gate (update gate), r_tIs a reset gate (reset gate), and these two gating vectors determine which information can be used as the output of the gated loop unit;

is the current memory content; x is the number of_tIs the node input at the current level,

is the effect of a "positive" behavior;

and

respectively, control the update gate z_tAnd a reset gate r_tThe parameters of (1);

and

is to control the pre-memory content

The parameters of (1); as a matrix multiplication at the element level, σ is a sigmoid function; hidden state h of last layer of gate control recurrent neural network_tThe output of (a) is the user interest representation v;

where v is a user interest representation, x_newIs a target short video;

the target short video click rate prediction value is obtained by the user;

and

is a matrix of transitions that is,

is an offset vector, b₂Is a bias scalar; σ is a sigmoid activation function;

designing a loss function according to the model characteristics; predicting value of click rate of target short video through user

Calculating a predicted value

And the true value y, and then using the error to update the model parameters; and guiding the updating process of the model parameters by adopting a cross entropy loss function:

wherein y is an actual value and represents whether the user clicks the target short video or not, wherein y belongs to {0,1 }; σ is a sigmoid function; model parameters were updated using Adam optimizer.