CN112084450A

CN112084450A - Click rate prediction method and system based on convolutional attention network deep session sequence

Info

Publication number: CN112084450A
Application number: CN202010950087.4A
Authority: CN
Inventors: 李平; 雷晓华
Original assignee: Changsha University of Science and Technology
Current assignee: Changsha University of Science and Technology
Priority date: 2020-09-09
Filing date: 2020-09-09
Publication date: 2020-12-15

Abstract

The application relates to a click rate prediction method and system based on a convolution attention network deep session sequence. The method comprises the following steps: the method comprises the steps of obtaining click data generated by historical behaviors of a user, inputting the click data into a pre-trained convolutional neural network, extracting long-term interaction characteristics of the user, converting the click data into a conversation sequence according to time information of the click data, inputting the conversation sequence into the pre-trained bidirectional long-time memory network, extracting short-term interaction characteristics of the user, fusing the long-term interaction characteristics and the short-term interaction characteristics to obtain fusion characteristics, inputting the fusion characteristics into a pre-trained full-connection neural network model to obtain the current interest type of the user, and predicting the click rate according to the interest type. By adopting the method, the accuracy of click prediction can be improved.

Description

Click rate prediction method and system based on convolutional attention network deep session sequence

Technical Field

The application relates to the technical field of computers, in particular to a click rate prediction method and system based on a convolutional attention network deep session sequence.

Background

Recommendation Systems (RSs) are becoming increasingly indispensable to assist users in finding their favorite items in Web-scale applications such as Amazon and Taobao. The recommendation system is a process for matching (match) an item of interest to a user according to attributes (such as gender, age, academic history, region and occupation) of the user, past behaviors (such as browsing, clicking, searching, purchasing, collecting and the like) of the user in the system and current context (such as network, mobile phone equipment, time and the like) of the user, so as to recommend items (such as e-commerce commodities, feeds recommended news, application store recommended apps and the like) which are possibly interested by the user to the user.

Generally, recommendation systems include two phases: candidate generation and candidate ranking. The candidate generation phase employs some naive but efficient recommendation algorithm (e.g., project-based collaborative filtering.A relatively small set of items is provided for ranking from a large overall set of items.

Although a relatively mature framework exists, the recommendation system still faces a relatively large number of problems, such as a cold start problem of a new user, a noise problem of data of the user, a large amount of sparse historical behavior data of many users and a small lack of individual interests of the users, which all result in a reduction in the efficiency of recommendation for the users. The method is based on the problem that the accuracy of the current recommendation for the user is not high.

Disclosure of Invention

Therefore, in order to solve the above technical problems, it is necessary to provide a click rate prediction method and system based on a convolutional attention network deep session sequence, which can solve the problem that the accuracy of the current user recommendation is not high.

A click-through rate prediction method based on a convolutional attention network deep session sequence, the method comprising:

acquiring click data generated by historical behaviors of a user;

inputting the click data into a pre-trained convolutional neural network, and extracting long-term interaction characteristics of the user;

converting the click data into a conversation sequence according to the time information of the click data, inputting the conversation sequence into a pre-trained bidirectional long-time memory network, and extracting short-term interaction characteristics of a user;

fusing the long-term interactive features and the short-term interactive features to obtain fused features;

inputting the fusion characteristics into a pre-trained fully-connected neural network model to obtain the current interest type of the user, and predicting the click rate according to the interest type.

In one embodiment, the method further comprises the following steps: acquiring sparse data generated by a user session; and vectorizing the sparse data to obtain click data of the low-dimensional density vector.

In one embodiment, the method further comprises the following steps: the click data is processed through the self-attention pooling layer as follows:

wherein alpha is_iThe self-attention coefficient is represented by,

score vector, W, representing self-attention_jAnd b_jRespectively representing a weight matrix and an offset; normalizing the self-attention coefficient to:

wherein, alpha'_iDenotes the normalized result, x'_i∈X_jRepresenting the sub-session data in the click data; pooling the click data according to the normalization result to obtain self-attentionAnd clicking on the data.

In one embodiment, the method further comprises the following steps: constructing a two-dimensional interaction feature according to an outer product between the self-attention click data; constructing a three-dimensional tensor according to all the two-dimensional interactive features in the click data; and inputting the three-dimensional tensor into a pre-trained convolutional neural network, and extracting the long-term interaction characteristics of the user.

In one embodiment, the method further comprises the following steps: converting the click data into a session sequence according to time information; and converting the conversation sequence into short-term click data according to a preset time interval.

In one embodiment, the method further comprises the following steps: setting a partial code according to the position information of the click data in the session sequence; updating the short-term click data according to the offset code to obtain offset short-term click data; and pooling the biased short-term click data by adopting a multi-head self-attention mechanism, inputting the pooled result into a pre-trained bidirectional long-time memory network, and extracting the short-term interaction characteristics of the user.

In one embodiment, the method further comprises the following steps: and fusing the long-term interactive features and the short-term interactive features by adopting a serial connection and smoothing mode to obtain fused features.

A click-through rate prediction system based on a convolutional attention network deep session sequence, the system comprising:

the data acquisition module is used for acquiring click data generated by historical behaviors of the user;

the long-term feature extraction module is used for inputting the click data into a pre-trained convolutional neural network and extracting the long-term interaction features of the user;

the short-term feature extraction module is used for converting the click data into a conversation sequence according to the time information of the click data, inputting the conversation sequence into a pre-trained bidirectional long-time memory network, and extracting the short-term interaction features of the user;

the feature fusion module is used for fusing the long-term interactive features and the short-term interactive features to obtain fusion features;

and the click prediction module is used for inputting the fusion characteristics into a pre-trained fully-connected neural network model to obtain the current interest type of the user and predicting the click rate according to the interest type.

A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

acquiring click data generated by historical behaviors of a user;

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:

acquiring click data generated by historical behaviors of a user;

According to the click rate prediction method, the click rate prediction system, the computer equipment and the storage medium based on the convolutional attention network deep conversation sequence, on one hand, long-term interaction characteristics of the user are extracted from click data through the convolutional neural network, on the other hand, the conversation sequence is constructed, and short-term interaction characteristics of the user are extracted from the conversation sequence through the bidirectional long-term and short-term memory network, so that short-term dynamic interests and long-term hidden interests of the user are more effectively extracted, and therefore the prediction accuracy can be remarkably improved during prediction.

Drawings

FIG. 1 is a flowchart illustrating a click-through rate prediction method based on a convolutional attention network deep session sequence according to an embodiment;

FIG. 2 is a flowchart illustrating the step of extracting long-term interaction features in one embodiment;

FIG. 3 is a schematic block diagram of a convolutional neural network in one embodiment;

FIG. 4 is a flow diagram illustrating the processing steps of a convolutional layer in one embodiment;

FIG. 5 is a flowchart illustrating the short-term interaction feature extraction step in one embodiment;

FIG. 6 is a block diagram of a click-through rate prediction system based on a convolutional attention network deep session sequence in one embodiment;

FIG. 7 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In one embodiment, as shown in fig. 1, a click-through rate prediction method based on a convolutional attention network deep session sequence is provided, which includes the following steps:

and 102, acquiring click data generated by historical behaviors of the user.

The historical behavior refers to interactive data generated by clicking corresponding content on the internet, a local area network or locally by a user, and the bias or trend of the interest of the user can be implied in the interactive data.

The click data is data generated by historical behaviors of the user, and the interest of the user can be extracted through analysis of the click data.

And 104, inputting click data into a pre-trained convolutional neural network, and extracting long-term interaction characteristics of the user.

The pre-trained convolutional neural network is obtained by training a training set constructed according to pre-acquired click data, so that the long-term interaction characteristics of the user can be extracted after the click data is input into the pre-trained convolutional neural network.

Convolutional Neural Networks (CNN) is one of deep learning algorithms, and long-term interaction features of a user can be effectively extracted by performing pooling, convolution and other processing on click data.

And step 106, converting the click data into a conversation sequence according to the time information of the click data, inputting the conversation sequence into a pre-trained bidirectional long-time and short-time memory network, and extracting the short-term interaction characteristics of the user.

The session sequence has time information, and the bidirectional long-and-short-term memory network can keep the information of the previous moment forwards or backwards, so that the short-term interaction characteristics of the user can be accurately extracted by using the bidirectional long-and-short-term memory network.

And step 108, fusing the long-term interactive features and the short-term interactive features to obtain fused features.

The feature fusion can be performed by means of splicing, superposition and the like.

And step 110, inputting the fusion characteristics into a pre-trained fully-connected neural network model to obtain the current interest type of the user, and predicting the click rate according to the interest type.

It should be noted that the prediction of the click rate can be integrated into the output layer of the fully-connected neural network model, and no specific limitation is made again.

According to the click rate prediction method based on the convolutional attention network deep conversation sequence, on one hand, long-term interaction characteristics of the user are extracted from click data through a convolutional neural network, on the other hand, the conversation sequence is constructed, and short-term interaction characteristics of the user are extracted from the conversation sequence through a bidirectional long-term and short-term memory network, so that short-term dynamic interests and long-term hidden interests of the user are more effectively extracted, and therefore the prediction accuracy can be remarkably improved during prediction.

In one embodiment, click data may be obtained by: acquiring sparse data generated by a user session; and vectorizing the sparse data to obtain click data of the low-dimensional density vector.

Specifically, the sparse data X is vectorized through Embedding, and large-scale sparse data is converted into click data V of a low-dimensional density vector_iWherein V is_i∈R^dAnd d is the size of Embedding.

In another embodiment, the click data is further processed as follows: the click data is processed through the self-attention pooling layer as follows:

wherein alpha is_iThe self-attention coefficient is represented by,

score vector, W, representing self-attention_jAnd b_jRespectively representing a weight matrix and an offset; the self-attention coefficient is normalized as:

wherein, alpha'_iDenotes the normalization result, x_i∈X_jIndicating the number of clicksAnd pooling click data according to the session subdata in the data according to the normalization result to obtain the self-attention click data.

In this embodiment, the click data V is generated due to the large sparsity of the input data_iToo large a scale increases the temporal complexity of the model, for which a self-attention pooling layer is employed to increase the extraction of valid features.

Specifically, pooling the click data according to the normalization result, and obtaining the self-attention click data may be:

wherein e is_jRepresenting self-attentive click data.

In yet another embodiment, as shown in fig. 2, the step of extracting the long-term interaction feature comprises:

step 202, constructing a two-dimensional interactive feature according to the outer product between the self-attention click data.

And step 204, constructing a three-dimensional tensor according to all two-dimensional interactive features in the click data.

That is, any two click data may be interacted with each other, and in this step, all two-dimensional interaction features in the click data need to be calculated, so that a three-dimensional tensor can be constructed according to a predetermined sequence.

And step 206, inputting the three-dimensional tensor into a pre-trained convolutional neural network, and extracting the long-term interaction characteristics of the user.

Specifically, for step 202, if the data e is clicked by the user, the data e is obtained_iAnd e_jAnd carrying out interaction, wherein the obtained two-dimensional interaction characteristics are as follows:

the two-dimensional interaction feature is a d x d matrix, which can be regarded as a two-dimensional "image" containing both interaction signals and embedded dimensional correlation. Assuming that the click data contains p feature fields, the total number of "images" generated is p (p-1)/2.

For step 204, the constructed three-dimensional tensor is:

C＝[M_1,2,M_1,3,...,M_i,j,...,M_p-1,p]

as can be seen from the formula, all of the above "images" are stacked together to form the 3D tensor C.

For step 206, as shown in FIG. 3, the convolutional neural network employs a three-dimensional convolutional neural network to process the three-dimensional tensor. After sparse data is input, coding is carried out on a coding layer, then self-attention pooling is carried out, self-attention click data is obtained, processing is carried out through a feature intersection layer, and finally long-term interaction features of a user are output.

The procedure of performing convolution processing by the three-dimensional convolution neural network is as follows, and assuming that the embedding size d is 64 and the number of eigen fields p is 10, the size of the three-dimensional tensor is 64 × 64 × 45. Fig. 4 illustrates the structure of a stacked three-dimensional convolutional neural network with 6 hidden layers, each with 32 channels, and with convolution operations performed in all three directions.

In one embodiment, the steps of constructing the session sequence are as follows: and converting the click data into a session sequence according to the time information, and converting the session sequence into short-term click data according to a preset time interval.

Specifically, the session sequence S ═ b₁；....；b_i；....；b_N]∈R^N×dWhere N is the number of sessions, b_iIndicating the ith click data. To extract more accurate user session interest, the sequence of user behavior S is divided into sessions Q, where the kth session Q_k＝[b₁；...；b_i；...；b_T]∈R^T×dT is the number, the behavior reserved in the session, b_iIs the ith data of the user in the session.

In particular, a subdivision of a user session exists between adjacent activities that are separated by a time interval of more than 30 minutes.

In one embodiment, as shown in fig. 5, the step of extracting short-term interaction features is as follows:

502, setting a partial code according to the position information of the click data in the conversation sequence;

step 504, updating the short-term click data according to the offset code to obtain offset short-term click data;

and step 506, pooling the biased short-term click data by adopting a multi-head self-attention mechanism, inputting the pooled result into a pre-trained bidirectional long-time and short-time memory network, and extracting the short-term interaction characteristics of the user.

Specifically, for step 502 and step 504, the self-attention mechanism applies the position code to the embedding of the input using the sequential relationship of the sequences. Furthermore, there is a need to capture the sequential relationships and deviations of sessions that exist in different representation subspaces. Therefore, it is proposed to bias-encode BE ∈ R on the basis of position-coding^K×T×dmodelWherein each element in the BE is defined as follows:

wherein

Is the deviant vector for the session, k is the index of the session,

is the deviation vector of the position in the session, t is the index of the data in the session,

is the location of the unit in the offset vector behavior embedding, and c is the index of the unit in the data embedding. After adding the deviation code, the behavior sessionQ of the user is updated as follows:

Q＝Q+BE

for step 506, the multi-head self-attention mechanism may capture relationships in different representation subspaces. From a computational point of view, let Q_k＝[Q_k1；...；Q_kh；...；Q_kH]Wherein Q is_kh∈R^T×dhIs Q_kH number of heads, H is the number of heads, and d_h＝1/h_dmodel。head_hThe output of (c) is calculated as follows:

wherein W^Q，W^K，W^VA linear matrix. The vectors of the different heads are concatenated and then fed into a feed forward network:

wherein FFN (-) is a feed forward network, w^oIs a linear matrix. Residual ligation and layer normalization were performed sequentially. Interest I of the user's kth session_kThe calculation is as follows:

where Avg (. cndot.) is the average pool. The weights are shared among the self-attention mechanisms of the different sessions.

In addition, there is a sequential relationship between the user's session interests and context. Modeling dynamic changes can enrich the representation of conversational interest. The Bi-directional long-and-short time memory network Bi-LSTM can effectively capture the sequence relation and is applied to modeling the interaction of session interest. The LSTM memory cell is implemented as follows:

i_t＝σ(W_xiI_t+W_hih_t-1+W_cic_t-1+b_i)

f_t＝σ(W_xfI_t+W_hfh_t-1+W_cfc_t-1+b_f)

c_t＝f_tc_t-1+i_ttanh(W_xcI_t+W_hch_t-1+b_c)

o_t＝σ(W_xoI_t+W_hoh_t-1+W_coc_t+b_o)

h_t＝o_t tanh(c_t)

where σ (-) is a logical function, i_t，f_t，o_tRespectively an input gate, a forgetting gate and an output gate. The shape of the weight matrix is indicated with subscripts. Bi-directional indicates that there is a forward and backward RNN and the hidden state H is calculated as follows:

wherein

Is a hidden state of the forward LSTM, and

is a hidden state of the backward LSTM.

Then, through the activation layer, the short-term interaction characteristics can be output by the bidirectional long-term and short-term memory network.

In one embodiment, the step of obtaining the fusion feature comprises: and fusing the long-term interactive features and the short-term interactive features by adopting a serial connection and smoothing mode to obtain fused features.

Specifically, the fully-connected neural network model may be as follows:

where D is a training data set and X is represented as [ X ]^U，X^I，S]Is the input to the network, y ∈ {0,1} indicates whether the user clicked on the item, and p (·) is the probability that the predicted final output user of the network clicked on the item.

It should be understood that although the steps in the flowcharts of fig. 1, 2, and 5 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 1, 2, and 5 may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or stages is not necessarily sequential, but may be performed alternately or alternatingly with other steps or at least some of the sub-steps or stages of other steps.

In one embodiment, as shown in fig. 6, there is provided a click-through rate prediction system based on a convolutional attention network deep session sequence, including: a data acquisition module 602, a long-term feature extraction module 604, a short-term feature extraction module 606, a feature fusion module 608, and a click prediction module 610, wherein:

a data obtaining module 602, configured to obtain click data generated by historical behaviors of a user;

a long-term feature extraction module 604, configured to input the click data into a pre-trained convolutional neural network, and extract long-term interaction features of the user;

a short-term feature extraction module 606, configured to convert the click data into a session sequence according to time information of the click data, input the session sequence into a pre-trained bidirectional long-term and short-term memory network, and extract a short-term interaction feature of the user;

a feature fusion module 608, configured to fuse the long-term interaction feature and the short-term interaction feature to obtain a fusion feature;

and the click prediction module 610 is used for inputting the fusion characteristics into a pre-trained fully-connected neural network model to obtain the current interest type of the user, and predicting the click rate according to the interest type.

In one embodiment, the data obtaining module 602 is further configured to obtain sparse data generated by a user session; and vectorizing the sparse data to obtain click data of the low-dimensional density vector.

In one embodiment, the data obtaining module 602 is further configured to process the click data through the self-attention pooling layer as follows:

wherein alpha is_iThe self-attention coefficient is represented by,

score vector, W, representing self-attention_jAnd b_jRespectively representing a weight matrix and an offset;

normalizing the self-attention coefficient to:

wherein, alpha'_iDenotes the normalized result, x'_i∈X_jRepresenting the sub-session data in the click data;

and pooling the click data according to the normalization result to obtain self-attention click data.

In one embodiment, the long-term feature extraction module 604 is further configured to construct two-dimensional interactive features according to an outer product between the self-attention click data; constructing a three-dimensional tensor according to all the two-dimensional interactive features in the click data; and inputting the three-dimensional tensor into a pre-trained convolutional neural network, and extracting the long-term interaction characteristics of the user.

In one embodiment, the short-term feature extraction module 606 is further configured to convert the click data into a session sequence according to time information; and converting the conversation sequence into short-term click data according to a preset time interval.

In one embodiment, the short-term feature extraction module 606 is further configured to set a partial code according to the location information of the click data in the session sequence; updating the short-term click data according to the offset code to obtain offset short-term click data; and pooling the biased short-term click data by adopting a multi-head self-attention mechanism, inputting the pooled result into a pre-trained bidirectional long-time memory network, and extracting the short-term interaction characteristics of the user.

In one embodiment, the feature fusion module 608 is further configured to fuse the long-term interactive feature and the short-term interactive feature in a concatenation and smoothing manner to obtain a fusion feature.

For specific limitations of the click rate prediction system based on the convolutional attention network deep session sequence, see the above limitations on the click rate prediction method based on the convolutional attention network deep session sequence, which are not described herein again. The various modules in the above described click-through rate prediction system based on a convolutional attention network deep session sequence may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 7. The computer device includes a processor, a memory, a network interface, a display screen, and an input system connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a click-through rate prediction method based on a sequence of convolutional attention network deep sessions. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input system of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 7 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In an embodiment, a computer device is provided, comprising a memory storing a computer program and a processor implementing the steps of the method in the above embodiments when the processor executes the computer program.

In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the method in the above-mentioned embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A click rate prediction method based on a convolutional attention network deep session sequence is characterized by comprising the following steps:

acquiring click data generated by historical behaviors of a user;

2. The method of claim 1, wherein obtaining click data generated by historical user behavior comprises:

acquiring sparse data generated by a user session;

and vectorizing the sparse data to obtain click data of the low-dimensional density vector.

3. The method of claim 2, wherein prior to inputting the click data into a pre-trained convolutional neural network to extract long-term interaction features of the user, the method further comprises:

the click data is processed through the self-attention pooling layer as follows:

wherein alpha is_iThe self-attention coefficient is represented by,

normalizing the self-attention coefficient to:

4. The method of claim 3, wherein inputting the click data into a pre-trained convolutional neural network to extract long-term interaction features of the user comprises:

constructing a two-dimensional interaction feature according to an outer product between the self-attention click data;

constructing a three-dimensional tensor according to all the two-dimensional interactive features in the click data;

and inputting the three-dimensional tensor into a pre-trained convolutional neural network, and extracting the long-term interaction characteristics of the user.

5. The method of claim 1, wherein converting the click data into a session sequence according to the time information of the click data comprises:

converting the click data into a session sequence according to time information;

and converting the conversation sequence into short-term click data according to a preset time interval.

6. The method of claim 5, wherein inputting the conversation sequence into a pre-trained two-way long-and-short-term memory network to extract short-term interaction features of the user comprises:

setting a partial code according to the position information of the click data in the session sequence;

updating the short-term click data according to the offset code to obtain offset short-term click data;

and pooling the biased short-term click data by adopting a multi-head self-attention mechanism, inputting the pooled result into a pre-trained bidirectional long-time memory network, and extracting the short-term interaction characteristics of the user.

7. The method according to any one of claims 1 to 6, wherein fusing the long-term interaction feature and the short-term interaction feature to obtain a fused feature comprises:

and fusing the long-term interactive features and the short-term interactive features by adopting a serial connection and smoothing mode to obtain fused features.

8. A system for click rate prediction based on a sequence of convolutional attention network deep sessions, the system comprising:

the data acquisition module is used for acquiring click data generated by historical behaviors;

9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.