CN115098786A - News recommendation method and system based on gating multi-head self-attention - Google Patents
News recommendation method and system based on gating multi-head self-attention Download PDFInfo
- Publication number
- CN115098786A CN115098786A CN202210867135.2A CN202210867135A CN115098786A CN 115098786 A CN115098786 A CN 115098786A CN 202210867135 A CN202210867135 A CN 202210867135A CN 115098786 A CN115098786 A CN 115098786A
- Authority
- CN
- China
- Prior art keywords
- news
- candidate
- user
- attention
- characteristic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 44
- 238000012549 training Methods 0.000 claims abstract description 26
- 238000001914 filtration Methods 0.000 claims abstract description 11
- 238000004364 calculation method Methods 0.000 claims description 12
- 230000002457 bidirectional effect Effects 0.000 claims description 8
- 238000003860 storage Methods 0.000 claims description 7
- 230000001105 regulatory effect Effects 0.000 claims description 6
- 230000008859 change Effects 0.000 claims description 5
- 230000002776 aggregation Effects 0.000 claims description 4
- 238000004220 aggregation Methods 0.000 claims description 4
- 230000004931 aggregating effect Effects 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 230000007246 mechanism Effects 0.000 abstract description 7
- 238000010586 diagram Methods 0.000 description 10
- 238000004590 computer program Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 7
- 230000006870 function Effects 0.000 description 5
- 239000011159 matrix material Substances 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000000225 bioluminescence resonance energy transfer Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 239000013598 vector Substances 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9536—Search customisation based on social or collaborative filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a news recommendation method and a system based on gating multi-head self-attention, belonging to the field of personalized news recommendation, and the method comprises the steps of obtaining historical click news and candidate news, and respectively coding the news by using a pre-training model BERT to obtain a historical click news characteristic and a candidate news characteristic; based on the multi-head self attention, capturing the relevance among the historical click news characteristics, and performing characteristic filtering by using the candidate news characteristics to obtain user characteristics; predicting the probability of each candidate news browsed by the user by combining the history click news characteristic, the candidate news characteristic and the user characteristic, and recommending the candidate news to the user based on the predicted probability; the invention adjusts the interest of the user by using a gating multi-head self-attention mechanism so as to better and accurately match the candidate news with the specific interest of the user; and the pre-training model BERT with rich language knowledge is applied to news recommendation to enhance the text representation of news and improve the accuracy of the news recommendation.
Description
Technical Field
The invention belongs to the field of personalized recommendation of news, and particularly relates to a news recommendation method and system based on gating multi-head self-attention.
Background
The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.
News recommendation is an important branch in the research field of recommendation systems, and aims to help users find news matched with the interest preferences of the users as much as possible through news contents and user information; the information implied by different words and different news articles when representing news and users is different; the use of the attention mechanism can endow different words and news with different weights so as to capture the semantic information of key news and important interesting clues of the user; for example, An et al propose a personalized word-level and news-level attention mechanism to pay attention to the influence of different words and news on users; wu et al propose an attention-getting multi-view learning mechanism to learn a representation of news from various components of the news (title, category, body); qi et al further propose a method for multi-head self-attention to capture long-distance correlations between words and words, news and news; in these methods, the use of attention mechanisms effectively improves the performance of news recommendations.
However, it may not be optimal to consider only the relationship between news browsed by the user in modeling the user interest because the user interest is wide, for example, the candidate news is not considered in the process of learning the user interest, which may cause the learned user interest to have more information unrelated to the candidate news, and it is difficult to accurately match the candidate news with the specific user interest.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a news recommendation method and a system based on gating multi-head self-attention, which adjust the interests of users by using a gating multi-head self-attention mechanism so as to better and accurately match candidate news with the interests of specific users; and the pre-training model BERT of rich language knowledge is applied to news recommendation to enhance the news text representation and improve the accuracy of the news recommendation.
To achieve the above object, one or more embodiments of the present invention provide the following technical solutions:
the invention provides a news recommending method based on gating multi-head self-attention in a first aspect;
a news recommendation method based on gating multi-head self-attention comprises the following steps:
acquiring historical click news and candidate news, and respectively performing news coding by using a pre-training model BERT to obtain a historical click news characteristic and a candidate news characteristic;
based on the multi-head self attention, capturing the relevance among the historical click news characteristics, and performing characteristic filtering by using the candidate news characteristics to obtain user characteristics;
and predicting the probability of browsing each candidate news by the user by combining the historical click news characteristic, the candidate news characteristic and the user characteristic, and recommending the candidate news to the user based on the predicted probability.
Further, the news coding comprises the following specific steps:
extracting a text representation of news by using a pre-training model BERT;
capturing bidirectional semantic dependence of text representation by using Bi-LSTM;
based on bidirectional semantic dependence, the output of the Bi-LSTM is aggregated by an attention network to obtain news characteristics containing rich context semantic information.
Further, the step of capturing the relevance between the historical click news features includes the specific steps of:
calculating query information, key information and value information of each feature in historical click news features;
performing multiple rounds of zoom dot product attention calculation on the query information, the key information and the value information, and storing the calculation result of each round;
and splicing all the calculation results, and carrying out linear change once to obtain the strengthened historical click news characteristics.
Further, the specific steps of performing feature filtering by using the candidate news features are as follows:
calculating the news internal information and the channel regulating gate information based on the strengthened historical click news characteristics, the query information and the candidate news characteristics;
performing dot product operation on the news internal information and the channel regulating gate information to obtain reconstructed news characteristics;
and carrying out attention weighted aggregation on the reconstructed news characteristics to generate user characteristics.
Further, the method for predicting the probability of browsing each candidate news by the user and recommending the candidate news to the user based on the predicted probability comprises the following specific steps:
constructing a training library consisting of positive samples and negative samples, and training a probability prediction model;
inputting candidate news characteristics to be predicted into a trained model to obtain the click probability of the candidate news;
and sequencing the click probability of a group of candidate news, and recommending the first few candidate news to the user.
Further, the probability prediction model inputs candidate news characteristics and user characteristics, and outputs click probabilities of the candidate news.
Further, the click probability is an inner product of the user characteristic and the news characteristic.
The invention provides a news recommending system based on gating multi-head self-attention in a second aspect.
A news recommending system based on gating multi-head self-attention comprises a news coding module, a user coding module and a probability prediction module:
a news encoding module configured to: acquiring historical click news and candidate news, and respectively coding the news by using a pre-training model BERT to obtain a historical click news characteristic and a candidate news characteristic;
a user encoding module configured to: based on the multi-head self attention, capturing the relevance among the historical click news features, and performing feature filtering by using the candidate news features to obtain user features;
a probability prediction module configured to: and predicting the probability of browsing each candidate news by the user by combining the historical click news characteristic, the candidate news characteristic and the user characteristic, and recommending the candidate news to the user based on the predicted probability.
A third aspect of the present invention provides a computer-readable storage medium, on which a program is stored, which, when being executed by a processor, carries out the steps of a method for recommending news based on gated multi-head self-attention according to the first aspect of the present invention.
A fourth aspect of the present invention provides an electronic device, including a memory, a processor, and a program stored in the memory and executable on the processor, where the processor executes the program to implement the steps in the method for recommending news based on multi-headed self-attention according to the first aspect of the present invention.
The above one or more technical solutions have the following beneficial effects:
the invention provides a news recommending method based on gating multi-head self-attention, which filters out information irrelevant to candidate news in history browsing news by using the candidate news through a gating mechanism so as to realize accurate matching of the candidate news and specific interests of a user.
The invention provides a multi-head self-attention gated news recommendation method, which adopts a pre-training model to mine deep semantics of a text, further strengthens semantic representation of news and greatly improves expressive force of the model.
The invention provides a news recommending method based on gating multi-head self-attention, which utilizes the correlation between historical click news of a user and candidate news to capture more relevant user interests and further strengthen the interest expression of the user, thereby providing an improved compromise between accuracy and speed.
Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are included to illustrate an exemplary embodiment of the invention and not to limit the invention.
Fig. 1 is an exemplary diagram of a reading behavior of a user's news.
Fig. 2 is a flow chart of the method of the first embodiment.
Fig. 3 is a system configuration diagram of the second embodiment.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present disclosure. As used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, and it should be understood that the terms "comprises" and "comprising", and any variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The embodiments and features of the embodiments of the invention may be combined with each other without conflict.
In fact, the interests of the user are diverse; when a candidate news item matches the user, it will typically match only a small percentage of the user's interests, as shown in fig. 1, the grey color shows important text segments that infer from the news the user browsed that he is interested in news in the animal, health, sports, and political domains, the fourth candidate news item is news related to political topics, which matches only the political news browsed by the user, but which is less relevant to news of other topics (animal and sports) browsed by the user; in the process of learning the interest of the user, the attention meeting is used for emphasizing the establishment of the relation between news of related subjects; but self-attention learning is carried out under the condition of not depending on candidate news, and information irrelevant to the candidate news is excessively reserved; which is a noise to the exact match of the candidate news with the particular user's interests.
If candidate news information is not considered in user modeling, the candidate news can be difficult to accurately match, therefore, the invention provides a news recommending method based on gating multi-head self-attention, and provides a neural news recommending frame PGRec enhanced by using a pre-training language model and having gating multi-head self-attention. In order to more effectively understand the semantics of news, a pre-trained BERT is introduced to enhance the representation of news; click news is also weighted using its relevance to the candidate news to capture more relevant user interests for matching; in the model training process, the click score is predicted by combining K +1 news by applying a negative sampling technology, namely candidate news consists of a positive sample of a user and K negative samples of the user selected randomly, and then the positive news is predicted togetherAnda negative news forecast score.
Example one
The embodiment discloses a news recommendation method based on gating multi-head self-attention;
as shown in fig. 2, a news recommendation method based on gated multi-head self-attention mainly includes three steps: news coding, candidate participating user coding, and click prediction.
Step 1, obtaining historical click news and candidate news, and respectively coding the news by using a pre-training model BERT to obtain a historical click news characteristic and a candidate news characteristic;
step 101, obtaining historical click news and candidate news
Obtaining a group of historical click news browsed by a given user and recording the historical click news as D u =[D 1 ,D 2 ,…,D N ]N is the number of historical news clicked by the user; the goal is to compute a given user click on a set of candidate news D C Probability of (2)D C =[D c1 ,D c2 ,…,D cM ]And M is the number of candidate news, and then the probabilities of the candidate news are ranked to recommend the best news.
102, news coding, which comprises the following specific steps:
(1) extracting text representation of news by using a pre-training model BERT;
news coding aims at learning deep semantic representations of news, previously pre-trained Word embedding such as Word2vec and Glove was typically used to initialize the embedding layer of the model, but these pre-trained Word embedding was mostly context-free, which would make the semantic information captured by the model insufficient.
Since the pre-training model BERT (devin et al, 2019) has 12 layers of transforms and the capability of enabling the model to better model complex context information in texts due to a large number of parameters, the invention adopts the pre-training model BERT as a news encoder.
Denote the input news feature as [ w ] by T tokens 1 ,w 2 ,…,w T ]Inputting news text into BRET model, after several layers of transform, marking the token expression of the obtained hidden layer as [ e 1 ,e 2 ,…,e T ]
(2) Capturing bidirectional semantic dependence of text representation by using Bi-LSTM;
in order to further strengthen the semantic relation, the output of the hidden layer of the BRET model is sent to Bi-LSTM, and the bidirectional semantic dependence of the text representation is further extracted.
(3) Based on bidirectional semantic dependence, the attention network is used for aggregating the output of the Bi-LSTM to obtain news characteristics containing rich context semantic information.
Connecting the output of the Bi-LSTM to an attention network, aggregating the text representations with bidirectional semantic dependence to obtain a news feature h containing rich context semantic information, wherein the history click news feature and the candidate news feature of the user are respectively marked as [ h ] 1 ,h 2 ,…,h N ]And h c 。
Step 2, capturing the relevance among historical click news characteristics based on the multi-head self attention, and performing characteristic filtering by using candidate news characteristics to obtain user characteristics;
the user code is intended to learn a user's representation from historical click news browsed by the user, and comprises two steps:
step 201, capturing the relevance among the historical click news characteristics based on a multi-head self-attention network of a news level to obtain a more deep representation of a user, and the specific steps are as follows:
(1) calculating query information, key information and value information of each feature in historical click news features;
the historical click news characteristic matrix is H ═ H 1 ,h 2 ,…,h N ]Wherein h is i The feature vectors of the news which is historically clicked by the ith user are converted into query information, key information and value information, and the formula is as follows:
H query =W query ·H (1)
H key =W key ·H (2)
H value =W value ·H (3)
wherein H query 、H key 、H value ∈R N×dim Inquiry information, key information and value information respectively representing historical click news characteristicsN represents the number of the historical click news, dim represents the uniform dimension of the characteristics of the historical click news, and W query 、W key 、W value Is a trainable weight matrix.
(2) Performing multiple rounds of zoom dot product attention calculation on the query information, the key information and the value information, and storing the calculation result of each round;
the formula for calculating the zoom dot product attention is as follows:
d k representing the number of hidden units of the neural network.
H query 、H key 、H value After linear change, performing zooming dot product attention calculation to calculate result head i The formula of (1) is:
here, an H-round scaling dot product attention calculation is performed, each time H query 、H key 、H value The linearly varying parameter W is not identical.
(3) And splicing all the calculation results, and carrying out linear change once to obtain the strengthened historical click news characteristics.
Splicing the calculation results of the h round, then carrying out linear change once, and taking the obtained value as the characteristic of the multi-head attention reconstruction, wherein the specific formula is as follows:
Z news =MultiHead(H query ,H key ,H value )=Concat(head 1 ,head 2 ,…,head h )W o (6)
wherein, W o Representing trainable parameters.
The results are reported as:
Z news =[Z 1 ,Z 2 ,…,Z N ] (7)
and S202, performing feature filtering by using the candidate news features based on the gating adjusting unit and the additional attention unit.
The gate control adjusting unit can filter information irrelevant to the candidate news from the historical click news characteristics of the user according to different candidate news so as to realize the matching of the candidate news and the specific interests of the user.
(1) Calculating the news internal information and the channel regulating gate information based on the strengthened historical click news characteristics, the query information and the candidate news characteristics;
the gating regulation unit comprises two inputs for information flow, one is the characteristic Z after reconstruction by multi-head attention news And query information H query The other is the candidate news characteristic h c And query information H query Constituent channel adjustment gate information g; in order to ensure that the feature quantity is the same, firstly copying candidate news feature vectors for N times and carrying out linear transformation, wherein N is the quantity of history clicked news of a user, and the two information input streams v and g in the gating regulation unit are calculated in the following modes:
h c =W transform repeat(h c ) (8)
(2) Performing dot product operation on the news internal information and the channel regulating gate information to obtain reconstructed news characteristics;
in order to filter information irrelevant to candidate news in historical click news of a user, dot product operation is carried out on internal information v of the browsing news and channel adjusting gate information g, so that excessive information irrelevant to the candidate news is prevented from being contained in the self-attention process, the candidate news is prevented from being matched with specific information of the user, and a gating filtering formula is as follows:
wherein an "" indicates a click operation of the matrix.
(3) Performing attention weighted aggregation on the reconstructed news characteristics to generate user characteristics;
because the relevance of the filtered and reconstructed historical click news characteristics and the candidate news is possibly different, the candidate news is utilized to perform attention weighted aggregation on the reconstructed news characteristics to generate the user characteristics u, and the formula is as follows:
where W, b is a trainable weight matrix and φ (-) is a linear network.
And 3, predicting the probability of browsing each candidate news by the user by combining the historical click news characteristic, the candidate news characteristic and the user characteristic, and recommending the candidate news to the user based on the predicted probability.
Predicting probability of user clicking candidate newsThe probability is calculated as the inner product of the user characteristic u and the candidate news characteristic, and the formula is as follows:
predicting click probability and recommending candidate news to a user, and the method specifically comprises the following steps:
step 301, constructing a training library consisting of positive samples and negative samples, and training a probability prediction model;
the probability prediction model is used for predicting the click probability of the candidate news, the candidate news characteristics and the user characteristics are input, and the click probability of the candidate news is output.
Because the number of positive and negative news samples is highly unbalanced, a negative sampling technology is applied to predict the click probability in combination with k +1 news in model training, wherein the k +1 news consists of a positive sample of one user and a randomly selected negative sample of the one user, the positive sample consists of news clicked by the user in an exposure sequence, and the negative sample consists of randomly extracted k news which are in the same exposure sequence with the positive sample but are not clicked by the user; co-prediction of positive newsAnd k negative newsAnd (4) converting the news click prediction problem into a pseudo k +1 classification task. These probabilities are normalized using softmax to compute the click probability of a positive sample, the loss function in the model training methodIs the negative log-likelihood of all positive samples, as follows:
wherein,indicating the click probability of the ith positive news item,the click probability of the jth negative news, which represents the same time period as the ith positive news, is the set of positive training samples.
Step 302, inputting candidate news characteristics to be predicted into a trained model to obtain the click probability of the candidate news;
and 303, sequencing the click probability of a group of candidate news, and recommending the first candidate news to the user.
Example two
The embodiment discloses a multi-head self-attention gating-based news recommendation system;
as shown in fig. 3, a news recommendation system based on gated multi-head self-attention includes a news encoding module, a user encoding module, and a probability prediction module:
a news encoding module configured to: acquiring historical click news and candidate news, and respectively performing news coding by using a pre-training model BERT to obtain a historical click news characteristic and a candidate news characteristic;
a user encoding module configured to: based on the multi-head self attention, capturing the relevance among the historical click news characteristics, and performing characteristic filtering by using the candidate news characteristics to obtain user characteristics;
a probability prediction module configured to: and predicting the probability of browsing each candidate news by the user by combining the historical click news characteristic, the candidate news characteristic and the user characteristic, and recommending the candidate news to the user based on the predicted probability.
EXAMPLE III
An object of the present embodiment is to provide a computer-readable storage medium.
A computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the steps in a gated multi-head self-attention based news recommendation method according to embodiment 1 of the present disclosure.
Example four
An object of the present embodiment is to provide an electronic device.
An electronic device includes a memory, a processor, and a program stored in the memory and executable on the processor, where the processor executes the program to implement the steps in the method for recommending news based on gated multi-head self-attention according to embodiment 1 of the present disclosure.
Those skilled in the art will appreciate that embodiments of the present disclosure can be provided as a method, system, or computer program product, and thus, the present disclosure can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects, and a computer program product embodied on one or more computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, and the like, having computer-usable program code embodied therein.
The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing is merely a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, which may be modified and varied by those skilled in the art; any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.
Although the present disclosure has been described with reference to specific embodiments, it should be understood that the scope of the present disclosure is not limited thereto, and those skilled in the art will appreciate that various modifications and changes can be made without departing from the spirit and scope of the present disclosure.
Claims (10)
1. A news recommendation method based on gating multi-head self-attention is characterized by comprising the following steps:
acquiring historical click news and candidate news, and respectively coding the news by using a pre-training model BERT to obtain a historical click news characteristic and a candidate news characteristic;
based on the multi-head self attention, capturing the relevance among the historical click news characteristics, and performing characteristic filtering by using the candidate news characteristics to obtain user characteristics;
and predicting the probability of browsing each candidate news by the user by combining the historical click news characteristic, the candidate news characteristic and the user characteristic, and recommending the candidate news to the user based on the predicted probability.
2. The method for recommending news based on multi-headed self-attention gating according to claim 1, wherein the news coding comprises the following specific steps:
extracting text representation of news by using a pre-training model BERT;
capturing bidirectional semantic dependence of text representation by using Bi-LSTM;
based on bidirectional semantic dependence, the attention network is used for aggregating the output of the Bi-LSTM to obtain news characteristics containing rich context semantic information.
3. The method for recommending news based on multi-gate self-attention as claimed in claim 1, wherein the step of capturing the correlation between the historical click news features comprises the following specific steps:
calculating query information, key information and value information of each feature in historical click news features;
performing multiple rounds of zoom dot product attention calculation on the query information, the key information and the value information, and storing the calculation result of each round;
and splicing all the calculation results, and carrying out linear change once to obtain the strengthened historical click news characteristics.
4. The method for recommending news based on gated multi-head self-attention as claimed in claim 1, wherein the specific steps of feature filtering with candidate news features are as follows:
calculating the news internal information and the channel regulating gate information based on the strengthened historical click news characteristics, the query information and the candidate news characteristics;
performing dot product operation on the news internal information and the channel regulating gate information to obtain reconstructed news characteristics;
and carrying out attention weighted aggregation on the reconstructed news characteristics to generate user characteristics.
5. The gated multi-head self-attention based news recommendation method as claimed in claim 1, wherein the probability of browsing each candidate news by the user is predicted, and the candidate news is recommended to the user based on the predicted probability, and the specific steps are as follows:
constructing a training library consisting of positive samples and negative samples, and training a probability prediction model;
inputting candidate news characteristics to be predicted into a trained model to obtain the click probability of the candidate news;
and sequencing the click probability of a group of candidate news, and recommending the first few candidate news to the user.
6. The method as claimed in claim 1, wherein the probability prediction model inputs candidate news characteristics and user characteristics, and outputs click probability of the candidate news.
7. The method for recommending news based on multi-gate self-attention as claimed in claim 1, wherein the click probability is an inner product of a user characteristic and a news characteristic.
8. A news recommendation system based on gating multi-head self-attention is characterized in that: the system comprises a news coding module, a user coding module and a probability prediction module:
a news encoding module configured to: acquiring historical click news and candidate news, and respectively coding the news by using a pre-training model BERT to obtain a historical click news characteristic and a candidate news characteristic;
a user encoding module configured to: based on the multi-head self attention, capturing the relevance among the historical click news characteristics, and performing characteristic filtering by using the candidate news characteristics to obtain user characteristics;
a probability prediction module configured to: and predicting the probability of browsing each candidate news by the user by combining the historical click news characteristic, the candidate news characteristic and the user characteristic, and recommending the candidate news to the user based on the predicted probability.
9. Computer-readable storage medium, on which a program is stored, which program, when being executed by a processor, carries out the steps of a gated multi-head self-attention based news recommendation method according to any one of claims 1-7.
10. Electronic device comprising a memory, a processor and a program stored on the memory and executable on the processor, wherein the processor executes the program to implement the steps of the method for gated multi-headed self-attention based news recommendation according to any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210867135.2A CN115098786A (en) | 2022-07-22 | 2022-07-22 | News recommendation method and system based on gating multi-head self-attention |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210867135.2A CN115098786A (en) | 2022-07-22 | 2022-07-22 | News recommendation method and system based on gating multi-head self-attention |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115098786A true CN115098786A (en) | 2022-09-23 |
Family
ID=83297958
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210867135.2A Pending CN115098786A (en) | 2022-07-22 | 2022-07-22 | News recommendation method and system based on gating multi-head self-attention |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115098786A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116911304A (en) * | 2023-09-12 | 2023-10-20 | 深圳须弥云图空间科技有限公司 | Text recommendation method and device |
CN117390290A (en) * | 2023-12-08 | 2024-01-12 | 安徽省立医院(中国科学技术大学附属第一医院) | Method for learning dynamic user interests based on language model of content enhancement |
-
2022
- 2022-07-22 CN CN202210867135.2A patent/CN115098786A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116911304A (en) * | 2023-09-12 | 2023-10-20 | 深圳须弥云图空间科技有限公司 | Text recommendation method and device |
CN116911304B (en) * | 2023-09-12 | 2024-02-20 | 深圳须弥云图空间科技有限公司 | Text recommendation method and device |
CN117390290A (en) * | 2023-12-08 | 2024-01-12 | 安徽省立医院(中国科学技术大学附属第一医院) | Method for learning dynamic user interests based on language model of content enhancement |
CN117390290B (en) * | 2023-12-08 | 2024-03-15 | 安徽省立医院(中国科学技术大学附属第一医院) | Method for learning dynamic user interests based on language model of content enhancement |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111324769B (en) | Training method of video information processing model, video information processing method and device | |
CN115098786A (en) | News recommendation method and system based on gating multi-head self-attention | |
CN112182154B (en) | Personalized search model for eliminating keyword ambiguity by using personal word vector | |
CN113590965B (en) | Video recommendation method integrating knowledge graph and emotion analysis | |
CN116994709A (en) | Personalized diet and exercise recommendation method and system and electronic equipment | |
Amir et al. | On the current state of deep learning for news recommendation | |
Hou et al. | Leveraging search history for improving person-job fit | |
Fazelnia et al. | Variational user modeling with slow and fast features | |
Zhang et al. | Rethinking adjacent dependency in session-based recommendations | |
CN116956183A (en) | Multimedia resource recommendation method, model training method, device and storage medium | |
Rauf et al. | BCE4ZSR: Bi-encoder empowered by teacher cross-encoder for zero-shot cold-start news recommendation | |
CN114443916B (en) | Supply and demand matching method and system for test data | |
CN115168724A (en) | News recommendation method and system fusing multi-granularity information | |
Yassin et al. | Travel user interest discovery from visual shared data in social networks | |
Moon et al. | BERT-Based Personalized Course Recommendation System from Online Learning Platform | |
CN118228718B (en) | Encoder processing method, text processing method and related equipment | |
Bougteb et al. | Tag2Seq: Enhancing Session-Based Recommender Systems with Tag-Based LSTM | |
Mi et al. | VVA: Video Values Analysis | |
Nguyen et al. | H-BERT4Rec: Enhancing Sequential Recommendation System on MOOCs based on Heterogeneous Information Networks | |
CN116521972B (en) | Information prediction method, device, electronic equipment and storage medium | |
Xu et al. | Exploiting Category Information in Sequential Recommendation | |
Gonsior et al. | Imital: learned active learning strategy on synthetic data | |
WANG | Sequential recommendation: From representation learning to reasoning | |
Zhu et al. | Next Job Application Prediction by Leveraging Textual Information, Metadata, and Personalized-Attention Mechanism | |
Li et al. | Modeling Learner Memory Based on LSTM Autoencoder and Collaborative Filtering |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Country or region after: China Address after: 250353 University Road, Changqing District, Ji'nan, Shandong Province, No. 3501 Applicant after: Qilu University of Technology (Shandong Academy of Sciences) Address before: 250353 University Road, Changqing District, Ji'nan, Shandong Province, No. 3501 Applicant before: Qilu University of Technology Country or region before: China |