CN115098786A

CN115098786A - News recommendation method and system based on gating multi-head self-attention

Info

Publication number: CN115098786A
Application number: CN202210867135.2A
Authority: CN
Inventors: 杨振宇; 崔来平; 李晓阳; 李怡雯
Original assignee: Qilu University of Technology
Current assignee: Qilu University of Technology
Priority date: 2022-07-22
Filing date: 2022-07-22
Publication date: 2022-09-23

Abstract

The invention provides a news recommendation method and a system based on gating multi-head self-attention, belonging to the field of personalized news recommendation, and the method comprises the steps of obtaining historical click news and candidate news, and respectively coding the news by using a pre-training model BERT to obtain a historical click news characteristic and a candidate news characteristic; based on the multi-head self attention, capturing the relevance among the historical click news characteristics, and performing characteristic filtering by using the candidate news characteristics to obtain user characteristics; predicting the probability of each candidate news browsed by the user by combining the history click news characteristic, the candidate news characteristic and the user characteristic, and recommending the candidate news to the user based on the predicted probability; the invention adjusts the interest of the user by using a gating multi-head self-attention mechanism so as to better and accurately match the candidate news with the specific interest of the user; and the pre-training model BERT with rich language knowledge is applied to news recommendation to enhance the text representation of news and improve the accuracy of the news recommendation.

Description

News recommendation method and system based on gated multi-head self-attention

Technical Field

The invention belongs to the field of personalized recommendation of news, and particularly relates to a news recommendation method and system based on gating multi-head self-attention.

Background

The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.

News recommendation is an important branch in the research field of recommendation systems, and aims to help users find news matched with the interest preferences of the users as much as possible through news contents and user information; the information implied by different words and different news articles when representing news and users is different; the use of the attention mechanism can endow different words and news with different weights so as to capture the semantic information of key news and important interesting clues of the user; for example, An et al propose a personalized word-level and news-level attention mechanism to pay attention to the influence of different words and news on users; wu et al propose an attention-getting multi-view learning mechanism to learn a representation of news from various components of the news (title, category, body); qi et al further propose a method for multi-head self-attention to capture long-distance correlations between words and words, news and news; in these methods, the use of attention mechanisms effectively improves the performance of news recommendations.

However, it may not be optimal to consider only the relationship between news browsed by the user in modeling the user interest because the user interest is wide, for example, the candidate news is not considered in the process of learning the user interest, which may cause the learned user interest to have more information unrelated to the candidate news, and it is difficult to accurately match the candidate news with the specific user interest.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a news recommendation method and a system based on gating multi-head self-attention, which adjust the interests of users by using a gating multi-head self-attention mechanism so as to better and accurately match candidate news with the interests of specific users; and the pre-training model BERT of rich language knowledge is applied to news recommendation to enhance the news text representation and improve the accuracy of the news recommendation.

To achieve the above object, one or more embodiments of the present invention provide the following technical solutions:

the invention provides a news recommending method based on gating multi-head self-attention in a first aspect;

a news recommendation method based on gating multi-head self-attention comprises the following steps:

acquiring historical click news and candidate news, and respectively performing news coding by using a pre-training model BERT to obtain a historical click news characteristic and a candidate news characteristic;

based on the multi-head self attention, capturing the relevance among the historical click news characteristics, and performing characteristic filtering by using the candidate news characteristics to obtain user characteristics;

and predicting the probability of browsing each candidate news by the user by combining the historical click news characteristic, the candidate news characteristic and the user characteristic, and recommending the candidate news to the user based on the predicted probability.

Further, the news coding comprises the following specific steps:

extracting a text representation of news by using a pre-training model BERT;

capturing bidirectional semantic dependence of text representation by using Bi-LSTM;

based on bidirectional semantic dependence, the output of the Bi-LSTM is aggregated by an attention network to obtain news characteristics containing rich context semantic information.

Further, the step of capturing the relevance between the historical click news features includes the specific steps of:

calculating query information, key information and value information of each feature in historical click news features;

performing multiple rounds of zoom dot product attention calculation on the query information, the key information and the value information, and storing the calculation result of each round;

and splicing all the calculation results, and carrying out linear change once to obtain the strengthened historical click news characteristics.

Further, the specific steps of performing feature filtering by using the candidate news features are as follows:

calculating the news internal information and the channel regulating gate information based on the strengthened historical click news characteristics, the query information and the candidate news characteristics;

performing dot product operation on the news internal information and the channel regulating gate information to obtain reconstructed news characteristics;

and carrying out attention weighted aggregation on the reconstructed news characteristics to generate user characteristics.

Further, the method for predicting the probability of browsing each candidate news by the user and recommending the candidate news to the user based on the predicted probability comprises the following specific steps:

constructing a training library consisting of positive samples and negative samples, and training a probability prediction model;

inputting candidate news characteristics to be predicted into a trained model to obtain the click probability of the candidate news;

and sequencing the click probability of a group of candidate news, and recommending the first few candidate news to the user.

Further, the probability prediction model inputs candidate news characteristics and user characteristics, and outputs click probabilities of the candidate news.

Further, the click probability is an inner product of the user characteristic and the news characteristic.

The invention provides a news recommending system based on gating multi-head self-attention in a second aspect.

A news recommending system based on gating multi-head self-attention comprises a news coding module, a user coding module and a probability prediction module:

a news encoding module configured to: acquiring historical click news and candidate news, and respectively coding the news by using a pre-training model BERT to obtain a historical click news characteristic and a candidate news characteristic;

a user encoding module configured to: based on the multi-head self attention, capturing the relevance among the historical click news features, and performing feature filtering by using the candidate news features to obtain user features;

a probability prediction module configured to: and predicting the probability of browsing each candidate news by the user by combining the historical click news characteristic, the candidate news characteristic and the user characteristic, and recommending the candidate news to the user based on the predicted probability.

A third aspect of the present invention provides a computer-readable storage medium, on which a program is stored, which, when being executed by a processor, carries out the steps of a method for recommending news based on gated multi-head self-attention according to the first aspect of the present invention.

A fourth aspect of the present invention provides an electronic device, including a memory, a processor, and a program stored in the memory and executable on the processor, where the processor executes the program to implement the steps in the method for recommending news based on multi-headed self-attention according to the first aspect of the present invention.

The above one or more technical solutions have the following beneficial effects:

the invention provides a news recommending method based on gating multi-head self-attention, which filters out information irrelevant to candidate news in history browsing news by using the candidate news through a gating mechanism so as to realize accurate matching of the candidate news and specific interests of a user.

The invention provides a multi-head self-attention gated news recommendation method, which adopts a pre-training model to mine deep semantics of a text, further strengthens semantic representation of news and greatly improves expressive force of the model.

The invention provides a news recommending method based on gating multi-head self-attention, which utilizes the correlation between historical click news of a user and candidate news to capture more relevant user interests and further strengthen the interest expression of the user, thereby providing an improved compromise between accuracy and speed.

Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are included to illustrate an exemplary embodiment of the invention and not to limit the invention.

Fig. 1 is an exemplary diagram of a reading behavior of a user's news.

Fig. 2 is a flow chart of the method of the first embodiment.

Fig. 3 is a system configuration diagram of the second embodiment.

Detailed Description

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present disclosure. As used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, and it should be understood that the terms "comprises" and "comprising", and any variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The embodiments and features of the embodiments of the invention may be combined with each other without conflict.

In fact, the interests of the user are diverse; when a candidate news item matches the user, it will typically match only a small percentage of the user's interests, as shown in fig. 1, the grey color shows important text segments that infer from the news the user browsed that he is interested in news in the animal, health, sports, and political domains, the fourth candidate news item is news related to political topics, which matches only the political news browsed by the user, but which is less relevant to news of other topics (animal and sports) browsed by the user; in the process of learning the interest of the user, the attention meeting is used for emphasizing the establishment of the relation between news of related subjects; but self-attention learning is carried out under the condition of not depending on candidate news, and information irrelevant to the candidate news is excessively reserved; which is a noise to the exact match of the candidate news with the particular user's interests.

If candidate news information is not considered in user modeling, the candidate news can be difficult to accurately match, therefore, the invention provides a news recommending method based on gating multi-head self-attention, and provides a neural news recommending frame PGRec enhanced by using a pre-training language model and having gating multi-head self-attention. In order to more effectively understand the semantics of news, a pre-trained BERT is introduced to enhance the representation of news; click news is also weighted using its relevance to the candidate news to capture more relevant user interests for matching; in the model training process, the click score is predicted by combining K +1 news by applying a negative sampling technology, namely candidate news consists of a positive sample of a user and K negative samples of the user selected randomly, and then the positive news is predicted together

And

a negative news forecast score.

Example one

The embodiment discloses a news recommendation method based on gating multi-head self-attention;

as shown in fig. 2, a news recommendation method based on gated multi-head self-attention mainly includes three steps: news coding, candidate participating user coding, and click prediction.

Step 1, obtaining historical click news and candidate news, and respectively coding the news by using a pre-training model BERT to obtain a historical click news characteristic and a candidate news characteristic;

step 101, obtaining historical click news and candidate news

Obtaining a group of historical click news browsed by a given user and recording the historical click news as D _u ＝[D ₁ ,D ₂ ,…,D _N ]N is the number of historical news clicked by the user; the goal is to compute a given user click on a set of candidate news D _C Probability of (2)

D _C ＝[D _c1 ,D _c2 ,…,D _cM ]And M is the number of candidate news, and then the probabilities of the candidate news are ranked to recommend the best news.

102, news coding, which comprises the following specific steps:

(1) extracting text representation of news by using a pre-training model BERT;

news coding aims at learning deep semantic representations of news, previously pre-trained Word embedding such as Word2vec and Glove was typically used to initialize the embedding layer of the model, but these pre-trained Word embedding was mostly context-free, which would make the semantic information captured by the model insufficient.

Since the pre-training model BERT (devin et al, 2019) has 12 layers of transforms and the capability of enabling the model to better model complex context information in texts due to a large number of parameters, the invention adopts the pre-training model BERT as a news encoder.

Denote the input news feature as [ w ] by T tokens ₁ ,w ₂ ,…,w _T ]Inputting news text into BRET model, after several layers of transform, marking the token expression of the obtained hidden layer as [ e ₁ ,e ₂ ,…,e _T ]

(2) Capturing bidirectional semantic dependence of text representation by using Bi-LSTM;

in order to further strengthen the semantic relation, the output of the hidden layer of the BRET model is sent to Bi-LSTM, and the bidirectional semantic dependence of the text representation is further extracted.

(3) Based on bidirectional semantic dependence, the attention network is used for aggregating the output of the Bi-LSTM to obtain news characteristics containing rich context semantic information.

Connecting the output of the Bi-LSTM to an attention network, aggregating the text representations with bidirectional semantic dependence to obtain a news feature h containing rich context semantic information, wherein the history click news feature and the candidate news feature of the user are respectively marked as [ h ] ₁ ,h ₂ ,…,h _N ]And h _c 。

Step 2, capturing the relevance among historical click news characteristics based on the multi-head self attention, and performing characteristic filtering by using candidate news characteristics to obtain user characteristics;

the user code is intended to learn a user's representation from historical click news browsed by the user, and comprises two steps:

step 201, capturing the relevance among the historical click news characteristics based on a multi-head self-attention network of a news level to obtain a more deep representation of a user, and the specific steps are as follows:

(1) calculating query information, key information and value information of each feature in historical click news features;

the historical click news characteristic matrix is H ═ H ₁ ,h ₂ ,…,h _N ]Wherein h is _i The feature vectors of the news which is historically clicked by the ith user are converted into query information, key information and value information, and the formula is as follows:

H _query ＝W _query ·H (1)

H _key ＝W _key ·H (2)

H _value ＝W _value ·H (3)

wherein H _query 、H _key 、H _value ∈R ^N×dim Inquiry information, key information and value information respectively representing historical click news characteristicsN represents the number of the historical click news, dim represents the uniform dimension of the characteristics of the historical click news, and W _query 、W _key 、W _value Is a trainable weight matrix.

(2) Performing multiple rounds of zoom dot product attention calculation on the query information, the key information and the value information, and storing the calculation result of each round;

the formula for calculating the zoom dot product attention is as follows:

d _k representing the number of hidden units of the neural network.

H _query 、H _key 、H _value After linear change, performing zooming dot product attention calculation to calculate result head _i The formula of (1) is:

here, an H-round scaling dot product attention calculation is performed, each time H _query 、H _key 、H _value The linearly varying parameter W is not identical.

(3) And splicing all the calculation results, and carrying out linear change once to obtain the strengthened historical click news characteristics.

Splicing the calculation results of the h round, then carrying out linear change once, and taking the obtained value as the characteristic of the multi-head attention reconstruction, wherein the specific formula is as follows:

Z _news ＝MultiHead(H _query ,H _key ,H _value )＝Concat(head ₁ ,head ₂ ,…,head _h )W _o (6)

wherein, W _o Representing trainable parameters.

The results are reported as:

Z _news ＝[Z ₁ ,Z ₂ ,…,Z _N ] (7)

and S202, performing feature filtering by using the candidate news features based on the gating adjusting unit and the additional attention unit.

The gate control adjusting unit can filter information irrelevant to the candidate news from the historical click news characteristics of the user according to different candidate news so as to realize the matching of the candidate news and the specific interests of the user.

(1) Calculating the news internal information and the channel regulating gate information based on the strengthened historical click news characteristics, the query information and the candidate news characteristics;

the gating regulation unit comprises two inputs for information flow, one is the characteristic Z after reconstruction by multi-head attention _news And query information H _query The other is the candidate news characteristic h _c And query information H _query Constituent channel adjustment gate information g; in order to ensure that the feature quantity is the same, firstly copying candidate news feature vectors for N times and carrying out linear transformation, wherein N is the quantity of history clicked news of a user, and the two information input streams v and g in the gating regulation unit are calculated in the following modes:

h _c ＝W _transform repeat(h _c ) (8)

wherein,

for trainable parameters, g, v ∈ R ^N×dim 。

(2) Performing dot product operation on the news internal information and the channel regulating gate information to obtain reconstructed news characteristics;

in order to filter information irrelevant to candidate news in historical click news of a user, dot product operation is carried out on internal information v of the browsing news and channel adjusting gate information g, so that excessive information irrelevant to the candidate news is prevented from being contained in the self-attention process, the candidate news is prevented from being matched with specific information of the user, and a gating filtering formula is as follows:

wherein an "" indicates a click operation of the matrix.

(3) Performing attention weighted aggregation on the reconstructed news characteristics to generate user characteristics;

because the relevance of the filtered and reconstructed historical click news characteristics and the candidate news is possibly different, the candidate news is utilized to perform attention weighted aggregation on the reconstructed news characteristics to generate the user characteristics u, and the formula is as follows:

where W, b is a trainable weight matrix and φ (-) is a linear network.

And 3, predicting the probability of browsing each candidate news by the user by combining the historical click news characteristic, the candidate news characteristic and the user characteristic, and recommending the candidate news to the user based on the predicted probability.

Predicting probability of user clicking candidate news

The probability is calculated as the inner product of the user characteristic u and the candidate news characteristic, and the formula is as follows:

predicting click probability and recommending candidate news to a user, and the method specifically comprises the following steps:

step 301, constructing a training library consisting of positive samples and negative samples, and training a probability prediction model;

the probability prediction model is used for predicting the click probability of the candidate news, the candidate news characteristics and the user characteristics are input, and the click probability of the candidate news is output.

Because the number of positive and negative news samples is highly unbalanced, a negative sampling technology is applied to predict the click probability in combination with k +1 news in model training, wherein the k +1 news consists of a positive sample of one user and a randomly selected negative sample of the one user, the positive sample consists of news clicked by the user in an exposure sequence, and the negative sample consists of randomly extracted k news which are in the same exposure sequence with the positive sample but are not clicked by the user; co-prediction of positive news

And k negative news

And (4) converting the news click prediction problem into a pseudo k +1 classification task. These probabilities are normalized using softmax to compute the click probability of a positive sample, the loss function in the model training method

Is the negative log-likelihood of all positive samples, as follows:

wherein,

indicating the click probability of the ith positive news item,

the click probability of the jth negative news, which represents the same time period as the ith positive news, is the set of positive training samples.

Step 302, inputting candidate news characteristics to be predicted into a trained model to obtain the click probability of the candidate news;

and 303, sequencing the click probability of a group of candidate news, and recommending the first candidate news to the user.

Example two

The embodiment discloses a multi-head self-attention gating-based news recommendation system;

as shown in fig. 3, a news recommendation system based on gated multi-head self-attention includes a news encoding module, a user encoding module, and a probability prediction module:

a news encoding module configured to: acquiring historical click news and candidate news, and respectively performing news coding by using a pre-training model BERT to obtain a historical click news characteristic and a candidate news characteristic;

a user encoding module configured to: based on the multi-head self attention, capturing the relevance among the historical click news characteristics, and performing characteristic filtering by using the candidate news characteristics to obtain user characteristics;

EXAMPLE III

An object of the present embodiment is to provide a computer-readable storage medium.

A computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the steps in a gated multi-head self-attention based news recommendation method according to embodiment 1 of the present disclosure.

Example four

An object of the present embodiment is to provide an electronic device.

An electronic device includes a memory, a processor, and a program stored in the memory and executable on the processor, where the processor executes the program to implement the steps in the method for recommending news based on gated multi-head self-attention according to embodiment 1 of the present disclosure.

Those skilled in the art will appreciate that embodiments of the present disclosure can be provided as a method, system, or computer program product, and thus, the present disclosure can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects, and a computer program product embodied on one or more computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, and the like, having computer-usable program code embodied therein.

The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The foregoing is merely a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, which may be modified and varied by those skilled in the art; any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Although the present disclosure has been described with reference to specific embodiments, it should be understood that the scope of the present disclosure is not limited thereto, and those skilled in the art will appreciate that various modifications and changes can be made without departing from the spirit and scope of the present disclosure.

Claims

1. A news recommendation method based on gating multi-head self-attention is characterized by comprising the following steps:

acquiring historical click news and candidate news, and respectively coding the news by using a pre-training model BERT to obtain a historical click news characteristic and a candidate news characteristic;

2. The method for recommending news based on multi-headed self-attention gating according to claim 1, wherein the news coding comprises the following specific steps:

extracting text representation of news by using a pre-training model BERT;

based on bidirectional semantic dependence, the attention network is used for aggregating the output of the Bi-LSTM to obtain news characteristics containing rich context semantic information.

3. The method for recommending news based on multi-gate self-attention as claimed in claim 1, wherein the step of capturing the correlation between the historical click news features comprises the following specific steps:

4. The method for recommending news based on gated multi-head self-attention as claimed in claim 1, wherein the specific steps of feature filtering with candidate news features are as follows:

5. The gated multi-head self-attention based news recommendation method as claimed in claim 1, wherein the probability of browsing each candidate news by the user is predicted, and the candidate news is recommended to the user based on the predicted probability, and the specific steps are as follows:

6. The method as claimed in claim 1, wherein the probability prediction model inputs candidate news characteristics and user characteristics, and outputs click probability of the candidate news.

7. The method for recommending news based on multi-gate self-attention as claimed in claim 1, wherein the click probability is an inner product of a user characteristic and a news characteristic.

8. A news recommendation system based on gating multi-head self-attention is characterized in that: the system comprises a news coding module, a user coding module and a probability prediction module:

9. Computer-readable storage medium, on which a program is stored, which program, when being executed by a processor, carries out the steps of a gated multi-head self-attention based news recommendation method according to any one of claims 1-7.

10. Electronic device comprising a memory, a processor and a program stored on the memory and executable on the processor, wherein the processor executes the program to implement the steps of the method for gated multi-headed self-attention based news recommendation according to any one of claims 1-7.