CN117150145A

CN117150145A - Personalized news recommendation method and system based on large language model

Info

Publication number: CN117150145A
Application number: CN202311423931.8A
Authority: CN
Inventors: 曹啸岭
Original assignee: Chengdu Enterprise Soft Digital Technology Co ltd
Current assignee: Chengdu Enterprise Soft Digital Technology Co ltd
Priority date: 2023-10-31
Filing date: 2023-10-31
Publication date: 2023-12-01
Anticipated expiration: 2043-10-31
Also published as: CN117150145B

Abstract

The invention discloses a personalized news recommending method and system based on a large language model, and relates to the technical field of news recommending. The method comprises the steps of obtaining news reading historical data of a user and encrypted account number sequence information data; extracting features of the acquired data to obtain news coding features and user coding features; constructing a mask large language model of a bidirectional coding representation thought paradigm based on a Transformer, and carrying out model training by taking news coding features and user coding features as model inputs; and performing personalized news recommendation according to the user news reading historical data and the encrypted account number sequence information data by using a trained mask large language model which is based on the bidirectional coding representation thought paradigm of the Transformer. According to the invention, the reading habit and the interest of the user can be more completely characterized and understood through the news recommendation large language model based on the mask attention according to the news reading history of the user, the user serial number and the like, and recommendation information with higher accuracy can be predicted in candidate news.

Description

Personalized news recommendation method and system based on large language model

Technical Field

The invention relates to the technical field of news recommendation, in particular to a personalized news recommendation method and system based on a large language model.

Background

With the rapid development of high and new technologies such as artificial intelligence, cloud computing, big data and the like, people are gradually put into the information society. In the face of explosion of tens of thousands of information in intelligent equipment, a platform for information interaction is complex, the desire of people for active information retrieval is reduced, and the passive information receiving mode of fragmentation is more and more relied on, especially information of heavy texts such as news. The recommendation system has been developed as a system for recommending personalized content to a user. Personalized news recommendations can help users alleviate information overload and improve news reading experience. The news reading software layered in the market can be regarded as a complex and huge news recommendation system, and the core is a personalized news recommendation strategy. The algorithm contained in the strategy can capture the preference of the user for reading news, and after understanding, a small amount of news materials which are huge in the sea are selected and recommended to the user, so that the matching degree of the content of the reading news and the interests of the user is improved, and the cost of active retrieval of people in limited time and space is reduced. In this way, the improvement of the accuracy of the news recommendation algorithm becomes the primary requirement and the long-term development direction of each media website or APP. Generally speaking, the content according to the recommendation algorithm is based on user-defined preferences when logging in a website, user history reading records, news own characteristics (title, theme, text, editor) and other contents related to timeliness.

Conventional recommendation algorithms mainly include collaborative filtering recommendation based on items (UserCF), collaborative filtering recommendation based on users (ItemCF), recommendation based on Contents (CB), and the like. UserCF recommends new item groups to target users by searching user sets similar to the interests of the target users, and the defect is that the cost of maintaining a user interest similarity matrix increases in a nonlinear way along with the rise of empty complexity, the similarity among users is unstable, and the interpretation of recommendation results is not strong; the ItemCF is mostly used for a scene with a fixed and durable user interest, the system does not need the auxiliary judgment of the item popularity to recommend quality, the item changing speed is low, and the similarity matrix maintenance cost is low. The popularity and timeliness are important points of personalized news recommendation, and both are contradictory to the factors of long-term interest of users, so that the requirements of news recommendation are not met; CB often gives a representation of the item first based on a vector space model (e.g., TF-IDF) method, and then learns user preferences and generates a recommendation list using nearest neighbor model (KNN). Drawbacks are difficult feature extraction, potential interest traps, and cold start stubborn diseases that are difficult to eliminate.

With the vigorous development of the field of Natural Language Processing (NLP), a recommendation system has gradually become one of application fields of deep learning in recent years, and the academic world and the industry have developed targeted search work for the same. Both the Wide & Deep model and the YouTube Deep learning recommendation model of google open up a network structure based on coding) +multi-layer perceptron (MLP), which aims at the cooperation of discrete feature processing capability and nonlinear fitting capability, and the effect is far superior to the stage of recall and sequencing by collaborative filtering and logistic regression of the prior emphasized feature mining work. Then, the network structures Deep FM and Deep & Cross Net based on the feature combination reduce the tedious manual work of the feature engineering stage and realize the transient improvement of the performance after the introduction of the low-dimensional dense vector. However, the above-mentioned various deep learning models and related derived versions have low feature application rates for user-side features and scenes, and remain constrained by the obvious timing features in NLP.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a personalized news recommendation method and a personalized news recommendation system based on a large language model.

In order to achieve the aim of the invention, the invention adopts the following technical scheme:

in a first aspect, the present invention provides a personalized news recommendation method based on a large language model, including the following steps:

acquiring news reading historical data of a user and encrypted account number sequence information data;

feature extraction is carried out on the news reading historical data of the user and the encrypted account number sequence information data, so that news coding features and user coding features are obtained;

constructing a mask large language model of a bidirectional coding representation thought paradigm based on a Transformer, and carrying out model training by taking news coding features and user coding features as model inputs; the mask large language model based on the bidirectional coding representation thought paradigm of the Transformer comprises a news representation coding module, a user representation coding module and a click prediction module;

and performing personalized news recommendation according to the user news reading historical data and the encrypted account number sequence information data by using a trained mask large language model which is based on the bidirectional coding representation thought paradigm of the Transformer.

In a second aspect, the present invention provides a personalized news recommendation system based on a large language model, which applies the method, and the personalized news recommendation system includes:

the data acquisition module is used for acquiring news reading historical data of the user and encrypted account number sequence information data;

the feature extraction module is used for extracting features of the news reading historical data of the user and the encrypted account number sequence information data to obtain news coding features and user coding features;

the model training module is used for constructing a mask large language model which represents the thought paradigm based on bidirectional coding of a Transformer, and carrying out model training by taking news coding features and user coding features as model input; the mask large language model based on the bidirectional coding representation thought paradigm of the Transformer comprises a news representation coding module, a user representation coding module and a click prediction module;

and the news recommending module is used for conducting personalized news recommending according to the news reading historical data of the user and the encrypted account number sequence information data by utilizing the trained mask large language model which is based on the bidirectional coding representation thought paradigm of the Transformer.

The invention has the following beneficial effects:

1. the mask large language model based on the transform two-way coding expression thought paradigm constructed by the invention uses word bag+position coding in the word embedding layer, and based on capturing the semantic and grammar relation between words and considering the accumulation effect, the examination of the sequence of the context words is added, thereby improving the accuracy of generating word vectors by low frequency or uncommon words and reducing the training time and the memory space.

2. The mask large language model based on the transform representation thought paradigm constructed by the invention uses the three-dimensional convolution neural network to replace the classical two-dimensional convolution neural network, adds a deep channel on the basis of the original convolution kernel, increases the consideration of one-dimensional interframe motion information, references the processing of videos in computer vision, inputs a plurality of dimensions in news through a three-dimensional filter, and improves the flexibility of short-term time sequence modeling.

3. The mask large language model based on the transform bi-directional coding representation thought paradigm constructed by the invention uses a mask attention (masked attention) representation layer to replace a classical self attention (self attention) representation layer, and the decoder is helped to focus attention on the most proper position of the input sequence by filling a mask (padding mask) and a sequence mask (sequence mask) on the basis that self-attention can reduce the dependence on external information and is good at capturing the internal correlation of the data characteristics.

Drawings

FIG. 1 is a flow diagram of a personalized news recommendation method based on a large language model in an embodiment of the invention;

FIG. 2 is a schematic diagram of a mask large language model structure based on a transform bi-directional coding representation idea paradigm in an embodiment of the invention;

Fig. 3 is a schematic structural diagram of a personalized news recommendation system based on a large language model in an embodiment of the invention.

Detailed Description

The following description of the embodiments of the present invention is provided to facilitate understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and all the inventions which make use of the inventive concept are protected by the spirit and scope of the present invention as defined and defined in the appended claims to those skilled in the art.

Example 1

As shown in fig. 1, the embodiment of the invention provides a personalized news recommendation method based on a large language model, which comprises the following steps S1 to S4:

s1, acquiring news reading historical data of a user and encrypted account number sequence information data;

in an optional embodiment of the present invention, the present embodiment achieves the purposes of removing noise, maintaining minimum information loss, reducing cold start hazard, and expanding data volume by acquiring microsoft news data set bond, google news data set containing user query request content and web page search title, and MSN news data set as raw data sets, and performing data preprocessing operations on the raw data sets, including but not limited to operations of sparsification, dummy variable filling, integration, reduction, and the like, followed by data enhancement operations.

According to the method, the user news reading historical data and the encrypted account number sequence information data in the original data set are obtained, wherein the user news reading historical data comprise click news, webpage records, search requests and the like, and the news recommendation large language model is built, so that recommendation information with higher accuracy is predicted in candidate news.

S2, extracting features of the news reading historical data of the user and the user sequence data to obtain news coding features and user coding features;

in an optional embodiment of the present invention, the present embodiment extracts news encoding features and user encoding features from the user news reading history data and the encrypted account number sequence information data; the news coding features specifically include: news category, news headline, and news body; the user coding features specifically include: web page records, clicked news, news related search request terms, and user system serial numbers.

Specifically, the present embodiment extracts news categories (news categories), news titles (news titles) and news bodies (news bodies) from the preprocessed candidate news (candidates) and its web page tag (tag) dataset, and uses the extracted feature information as input features (inputs) of the news code representation module.

The present embodiment extracts a web page record (web records), a clicked news (clicked news), a news related search request term (search query), a user system serial number (user id) from a preprocessed candidate news (candidate news) and a web page tag (tag) dataset thereof, and uses the extracted feature information as an input feature (input) of a user code representation module.

According to the embodiment, key features such as news clicks, search requests and webpage records are introduced, so that the defect of insufficient text feature extraction in the traditional deep learning method is overcome.

S3, constructing a mask large language model of a bidirectional coding expression thought paradigm based on a transducer, and carrying out model training by taking news coding features and user coding features as model inputs; the mask large language model based on the bidirectional coding representation thought paradigm of the Transformer comprises a news representation coding module, a user representation coding module and a click prediction module;

in an optional embodiment of the present invention, a mask large language model based on a bidirectional coding expression thought paradigm of a transform is constructed in the embodiment, in an information stream such as news, the extraction capability of preference of a user is strong, and a barrier of a sequence such as RNN, GRU and the like is broken through, and by utilizing the advantage of an improved multi-head Attention mechanism (mask Attention), a model built by combining Pre-training) +fine-tuning) +prompting and the like gradually improves the capability of the model to adapt to a scene, so that a downstream task can obtain more accurate recommended information.

The mask large language model based on the bidirectional coding expression thought paradigm of the Transformer constructed in the embodiment comprises a news expression coding module, a user expression coding module and a click prediction module, wherein the input news data is classified and then is subjected to splicing, dot multiplication and normalization operations after a position coding word embedding layer, a 3D convolutional neural network layer and a mask attention layer are added, so that the problems that news text characteristics are not comprehensively extracted and the model depends on external characteristics are fully solved.

The news representation coding module in this embodiment specifically includes:

a first word embedding layer, a first full connection layer, a first 3D convolutional neural network, and a first mask attention layer;

the first word embedding layer is used for performing dimension reduction operation on the input news coding features to obtain dense vectors coded into fixed dimensions, mapping the dense vectors into a marked numerical vector space and outputting to obtain first low-dimension dense feature vectors;

the first full-connection layer is used for obtaining an output feature vector of the first full-connection layer through superposition of linear transformation of automatic updating of a weight and a bias vector and nonlinear transformation of a primary activation function of a first low-dimensional dense feature vector obtained by outputting the first word embedding layer;

The first 3D convolution kernel neural network is used for roughly classifying the first low-dimensional dense feature vector obtained by outputting the first word embedding layer to obtain an output feature vector of the first 3D convolution kernel neural network;

the first mask attention layer is used for enabling the output feature vectors of the first full-connection layer and the first 3D convolution kernel neural network to pass through a multi-layer self-attention and summation and normalization coding and decoding layer, and providing large negative bias at the same time to obtain news representation coding features after feature screening.

Specifically, in this embodiment, the text corpus of the subject, the title and the text included in the news coding feature extracted in the step S2 is input to the first word embedding layer (word embedding) for performing a dimension reduction operation, and is coded into a dense vector with a fixed dimension, mapped to a numerical vector space of a mark, and 3 sets of feature vectors (vc 1-vcm\vt1-vtm \vb1-vbm) with a vector dimension of 256 are output;

then, the news category is input into a first full-connected layer (dense) through a low-dimensional dense feature vector (vc 1-vcm) formed after the first word is embedded, and a high-rank matrix is output into a flattened (flat) 128-dimensional vector (rn-d) through the superposition of linear transformation of which the weight and the bias vector are automatically updated once and nonlinear transformation of an activation function once;

Simultaneously inputting a low-dimensional dense feature vector (vt 1-vtm \vb1-vbm) formed by embedding a news headline and a news text through a first word into a first 3D convolution kernel neural network (3 dcnn), roughly classifying 256-dimensional feature vectors by utilizing a plurality of continuous hard wire layers, convolution layers and downsampling layers, and outputting the feature vector (rn-t\rn-b) with 128 nodes;

and finally, inputting the output feature vectors (rn-d\rn-t\rn-b) of the first full-connection layer and the first 3D convolution kernel neural network into a first mask attention layer (masked attention), and carrying out internal multi-layer self-attention (self-attention) and summation and normalization (norm) coding and decoding layers, wherein the mask matrix of each step is shaped (size) after being summed, and meanwhile, large negative bias is provided to avoid invalid areas from participating in calculation, and outputting the feature vectors of most of features after fine screening.

In this embodiment, the user representation encoding module specifically includes:

an embedding layer, a second word embedding layer, a second full connection layer, a second 3D convolutional neural network, and a second mask attention layer;

the embedded layer is used for uniquely expanding the user system serial numbers in the input user coding features and outputting the unique user characteristic vectors;

The second word embedding layer is used for performing dimension reduction operation on the webpage records, clicked news and news related search request entries in the input user coding features to obtain dense vectors coded into fixed dimensions, mapping the dense vectors into a marked numerical vector space and outputting to obtain second low-dimensional dense feature vectors;

the second full-connection layer is used for superposing the independent heat user characteristic vector output by the embedded layer through linear transformation of which the weight and the bias vector are automatically updated once and nonlinear transformation of a primary activation function to obtain an output characteristic vector of the second full-connection layer;

the second 3D convolution kernel neural network is used for roughly classifying the second low-dimensional dense feature vector obtained by outputting the second word embedding layer to obtain an output feature vector of the second 3D convolution kernel neural network;

the second mask attention layer is used for carrying out multi-layer self-attention and summation and normalization on the output feature vectors of the second full-connection layer and the second 3D convolution kernel neural network, and simultaneously providing large negative bias to obtain the user representation coding features after feature screening.

Specifically, in this embodiment, the User system serial number (User id) included in the User coding feature extracted in step S2 is input into the corresponding embedding layer (embedding) to perform unique (one-hot) expansion, and a unique User feature vector capable of being used in a cross manner with other subsequent features is output;

Inputting the webpage records, click news and text corpus of search request, which are contained in the user coding features extracted in the step S2, to a second word embedding layer (word embedding) for dimension reduction operation, coding into dense vectors with fixed dimensions, mapping the dense vectors to a marked numerical vector space, and outputting feature vectors (WR 1-i\LN1-i\SQ 1-i) with 3 groups of vector dimensions 256;

then, the low-dimensional dense feature vector formed by uniheat expansion of the user system serial number through the embedded layer is input into a second full-connected layer (dense), and the high-rank matrix is output into a flattened 128-dimensional vector through linear transformation of which the weight and the bias vector are automatically updated once and superposition of nonlinear transformation of an activation function once;

inputting a low-dimensional dense feature vector (WR 1-i\LN1-i\SQ1-i) formed by a second word embedding layer of webpage record, click news and search request into a second 3D convolution kernel neural network (3 dcnn), roughly classifying 256-dimensional feature vectors by utilizing a plurality of continuous hard wire layers, convolution layers and downsampling layers, and outputting a feature vector layer (vw 1-m\vl1-m\vs 1-m) with 128 nodes;

and finally, inputting the output feature vector (vw 1-m\vl1-m\vs 1-m) of the second 3D convolution kernel neural network and the userid vector after passing through the second full-connection layer into a second mask attention layer (masked attention), and a coding and decoding layer of summation and normalization (norm) through the self-attention (self-attention) of the inner layers, wherein the mask matrix of each step is integrated and shaped (size), and meanwhile, large negative bias is provided to avoid the participation of an invalid region in calculation, and the feature vector of most of features is reserved after the output fine screening.

The first word embedding layer in the news representation coding module and the second word embedding layer in the user representation coding module in this embodiment each include a first input layer, a hidden layer, an output layer and a position coding layer; the input layer is composed of a context sequence { x1-xp } input through single-hot coding, the overall dimension is p, the length of a vocabulary sequence is l, the dimension of a hidden layer vector is N, and the word of an output layer is y.

The first word embedding layer and the second word embedding layer processing method comprises the following steps:

inputting the feature vector into a first input layer, and calculating an output vector of a hidden layer, wherein the output vector is expressed as:

where hit represents the output vector of the hidden layer,Wa weight matrix representing the feature vectors, p representing the dimensions of the context sequence of feature vectors,x _i represent the firstiA feature vector;

calculating an input vector of the output layer according to the output vector of the hidden layer, wherein the input vector is expressed as:

wherein,z _j an output vector representing an output layer is presented,f _wj representing a weight matrixWIs the first of (2)jColumn elements;

calculating an output vector of the output layer from the input vector of the output layer, expressed as:

wherein,representing the output layer's outputGo out vector (I),>the posterior distribution probability of the vocabulary is represented,w _j the representation comprises the firstjWeight of individual words->Representing weights containing 1 st to p-th words, exp representing a natural exponential function, and L representing the total number of words in an output vector of an output layer; wherein the weight matrix WThe update of (1) uses the log-taking contextual vocabulary conditional probability for back propagation +.>Wherein S is a selected loss function +.>，Representing the updated weight value, +.>Representing the weight value before update, +.>Indicating the learning rate, h indicating the weight value of the current hidden layer,/>Representing input layer window size,/->Representing front and back layers->Posterior probability of (2);

adding the output vector of the output layer into the position coding of the word vector in the context at the position coding layer to form a final output vector, which is expressed as:

where y represents the final output vector,the representation position codes a representation vector,locrepresenting the position of a word in a sentence, k representing the position of the word vector in the vocabulary, dim_in representing the dimension of the word vector; thereby adding location information to the bag-of-words based word embedding layer.

The first 3D convolutional neural network in the news representation coding module and the second 3D convolutional neural network in the user representation coding module in this embodiment each include a second input layer, a hard-line layer, a convolutional layer and a downsampling layer;

the calculation method of the convolution kernel of the convolution layer comprises the following steps:

wherein,expressed in the position [ (]l,m,n) On the firstuLayer numbervUnit value of convolution kernel on each feature map, +. >Represents the activation function, z represents the firstuThe number of sets of layer 1 feature maps connected to the current feature map,E _u representing the height of the convolution kernel,F _u representing the width of the convolution kernel,G _u channel number representing convolution kernel, +.>Representing connection NozThe positioning of the individual feature graphs in the convolution kernele,f,g) Weights of channels +.>Expressed in the position [ (]l,m,n) Upper firstuThe-1 layer convolution kernel is connected to the first layerzAfter each feature map, at the core position（e,f,g) Unit value of>Representing the bias of the ith feature map of the u-th layer.

The calculation method of the feature map size obtained after the convolution operation of the convolution layer comprises the following steps:

wherein,representing the height of the output profile, +.>Representing the height of the input profile, +.>Fill amount of representation degree, +.>Representing the height of the convolution kernel, +.>Represents the height of the step extension, +.>Representing the width of the output profile, < >>Representing the width of the input profile, < >>Representing the filling quantity of the width +.>Representing the width of the convolution kernel, +.>The step extension of the width is indicated.

The first mask attention layer in the news representation coding module and the second mask attention layer in the user representation coding module of the embodiment each comprise an overlapping layer, a normalization layer, a feedforward neural network, a self-attention layer and a mask layer;

the purpose of the superimposed layer is derived from the function of preventing degradation of the residual network, and the normalization layer performs normalization operation on the input vector, so that convergence capacity is improved during calculation. The operation of the self-attention layer and the mask layer is a core step, and is essentially to focus and screen a small amount of key information in input information and reflect the key information on adjustment of weight coefficients, and the calculation method of the output feature vector of the self-attention layer comprises the following steps:

Wherein,an output feature vector representing the self-attention layer, V representing the content matrix, exp representing the natural exponential function, K representing the index matrix,Trepresenting a transpose operation, Q represents a query matrix,sqrtrepresenting arithmetic square root operations, dim _n Represent the firstnFeature dimensions dim of the input feature vectors _i Represent the firstiFeature dimensions of the input feature vectors;

the Mask layer has the functions of covering local information, reducing the prediction capability of the model on information excessive interpretation, avoiding overfitting, introducing a multi-head mechanism, and the computing method of the output feature vector of the Mask layer comprises the following steps:

wherein,output feature vector representing mask layer, +.>Representing a sequence of masking operations,/->Representing a normalized weight matrix ++>Representation of the first pairiAnd (b)jThe individual matrices are subjected to a stitching operation after passing through the self-focusing layer,/->Representation of the first pairiAnd (b)jThe individual matrixes are subjected to slicing and splicing operation, V _i Represents the ith content matrix, Q _i Represents the ith query matrix, K _i Representing the ith index matrix.

The mask attention mechanism is added in the mask large language model of the bidirectional coding representation thought paradigm based on the Transformer, the method has the characteristics of reducing dependence on external information and capturing internal correlation of the data characteristic, and the effect of focusing attention on the optimal position of the input sequence is achieved.

The click prediction module in this embodiment specifically includes:

the system comprises a first splicing layer, a second splicing layer, a fusion layer and an activation layer;

the first splicing layer is used for splicing the news representation coding features output by the news representation coding module, and the output feature vectors of the first splicing layer are obtained by adding channels to perform feature superposition on the transverse space;

the second splicing layer is used for splicing the user representation coding features output by the user representation coding module, and the output feature vectors of the second splicing layer are obtained by adding channels to perform feature superposition on the transverse space;

the fusion layer is used for performing point multiplication operation on the output feature vectors of the first splicing layer and the second splicing layer, and fusing the features in the two spaces of the news code representation and the user code representation to obtain a fusion feature vector output by the fusion layer;

the activation layer is used for activating the fusion feature vector output by the fusion layer to obtain the predicted click rate of the news.

Specifically, in this embodiment, the output feature vectors of the first mask attention layer in the news representation encoding module and the second mask attention layer in the user representation encoding module are used as input features (inputs) of a click prediction module (click predictor);

Performing internal splicing (concat) operation on output feature vectors of a first mask attention layer in a news representation coding module and a second mask attention layer in a user representation coding module at a first splicing layer and a second splicing layer respectively, and outputting a vector (r-an) containing all feature splicing completion of the news representation coding module and a vector (r-us) containing all feature splicing completion of the user representation coding module by adding channels to realize feature superposition on a transverse space;

carrying out dot product operation (dot) on a vector (r-an) containing all the characteristic splicing completion of the news representation coding module and a vector (r-us) containing all the characteristic splicing completion of the user representation coding module at a fusion layer, fusing the characteristics in two spaces of news and user coding representation, and outputting a fused characteristic vector;

the fusion feature vector is activated by softmax at the activation layer, and the numerical value of each node is converted into probability values distributed in the (0, 1) interval, namely the predicted click rate of news (click probability).

S4, performing personalized news recommendation according to the news reading historical data of the user and the encrypted account number sequence information data by using a trained mask large language model which is based on the bidirectional coding of the Transformer and represents the thought paradigm.

The performance analysis is performed on the personalized news recommendation method based on the large language model provided by the invention by combining a specific simulation experiment.

The data set adopted in the simulation experiment is a MIND database based on Microsoft and a database based on MSN+Google news, wherein the MIND database comprises anonymized news click records of users, and contains more than 20 tens of thousands of news items, more than 1900 tens of thousands of show records, and more than 3000 tens of thousands of click behaviors, webpage records and search requests from 150 tens of thousands of anonymous users, and each news item has rich text information such as title, abstract, text, category, entity and the like.

The schematic design of the model network of the invention is shown in fig. 2, wherein the model network mainly comprises a specific network structure and a single-module structure; tables 1 and 2 show the final recommended effect comparison table of the model, and the results are considered based on various recommended indexes such as ACC, AUC, F1-score, HR, MRR, NDCG, and the experiment is based on a pytorch deep learning framework to realize the network.

TABLE 1 comparison of different recommendation algorithm effects under MIND and MSN+Google news data set data table

Table 2 comparison of recommended effects against data table after various methods were replaced by different steps in the present invention

Table 1 shows the comparison of different recommendation algorithm effects under MIND and MSN+Google news data sets, and mainly reports the comparison result of a news recommendation algorithm (MALM) based on a large language model and other mainstream recommendation algorithms in the industry, wherein the model result of each index in the MIND data set is slightly higher than that of the MSN+Google news data set, and the MIND data set is analyzed to be finished by Microsoft news, and the data cleanliness is far higher than that of the MSN+Google news data set. The evaluation index comprises the recommended result evaluation modes which are more commonly used in the 6 recommended systems of ACC, AUC, F1-score, HR, MRR, NDCG. The first class UserCF, itemCF, CB, GRU is a popular recommendation algorithm based on ID, the second class belongs to a recommendation algorithm in the initial stage of Deep learning introduction, and is respectively a Wide & Deep, a Youtube DNN, a DeepFM, DSSM, deep & CrossNet and a LibFM, and the third class is a recommendation algorithm based on pre-training, and comprises DFM, NRMS, NPA and LSTUR. The data in Table 2 are observed, and the indexes of the third analoging algorithm are obviously higher than those of the first two types, which proves that the pre-training and fine-tuning method has remarkable advantages because of the capability of adapting the model to the scene. In the third class of methods, the MALLM of the present invention ranks 2 in the ACC index, 0.23 percent lower than the highest NRMS; the first row, 2.5 percent above average, in the index AUC; third row in F1 index, 0.3 percentage points higher than average; 2 nd row in the HR index, 1.8 percent higher than the average level; 3 rd row in MRR index, 0.5 percent higher than average level; the 1 st row is higher than the average level by 1.5 percent on the NDCG index. In summary, the comprehensive ranking of the MALM model of the invention is the first in the third analoging algorithm and is completely higher than the first two types, and the feasibility of the overall construction idea of the model is verified on the whole.

Table 2 shows the comparison effect of the recommended results after various methods are replaced in different steps in the invention, wherein the first step word embedding uses four different methods of KNN, TF-IDF, CBOW and CBOW+LE, and as a result, the improvement effect of the method of word bag CBOW+positioning coding LE adopted in the invention on the model in the word embedding step is remarkable, mainly because the examination of the sequence of the context words is added, the accuracy of generating word vectors by low-frequency or uncommon words is improved; in the second step, 3dCNN is used for replacing CNN, and the result is about 5% higher than the effect of the original CNN method in 5 indexes of ACC, AUC, F, HR and NDCG, so that the channel with increased depth of the three-dimensional convolutional neural network is proved, and the flexibility of short-term time sequence modeling is improved; in the third step, the mask attention is used for replacing the self-attention layer, and 6 indexes are improved by 2% -5%, so that the mask layer is proved to weaken the overfitting effect of the model, and the prediction accuracy is improved. Table 3 verifies the integrity and adaptability of the model local structure flow from each step individually.

The loss function adopted in the experiment comprises a second-class cross entropy loss, a BPR loss, a cross entropy loss, a BCEWITHLogits loss and the like, and has the advantages of easy convergence, high accuracy, wide application range, and suitability for multi-classification after being used for two-class and expandable. The optimizer comprises AdaGrad and Adam, and has the advantages that the learning rate of all model parameters is independently adapted, the square root of the sum of all gradient historical average values of each parameter is scaled, manual adjustment is not needed, and after offset correction, each iteration learning rate has a fixed range, so that the parameters are stable, sparse gradients are well processed, and the optimization method is suitable for non-convex optimization problems, large data sets and high-dimensional space.

The evaluation index adopted in the test process of the model is ACC, AUC, F1-score, HR, MRR, NDCG. ACC is an accuracy rate, representing the number of samples divided by the number of all samples; AUC is the area under the ROC curve, can reflect the relative ordering capacity of the model, and the classifier effect is better when the numerical value is larger; the F1-score combines the accuracy rate and the recall rate, has wider universality, and the classifier effect is better when the numerical value is larger; HR is recommendation hit rate, and the emphasis is on the accuracy of model recommendation, namely whether the requirement items of the user are contained in the recommendation items of the model, and the classifier effect is better when the numerical value is larger; MRR is the average reciprocal rank, emphasizes the position of the user's demand item in the model recommendation list, the better the higher the front; the NDCG is normalized damage accumulation gain, and the normalized NDCG is a relative value, so that even if different users can be compared, the NDCG emphasizes that the position of a user's requirement item in a model recommendation list is better the more forward.

In summary, the invention is oriented to the electronic news publishing popularization platform, a news recommendation large language model based on mask attention is built according to the existing news reading history (such as click news, webpage records and search requests) of platform users and user serial numbers and the like through methods in the fields of deep learning, natural language processing and the like, the combination of word embedding position codes, three-dimensional convolutional neural networks, mask attention layers and the like is fully utilized to more completely characterize and understand the reading habit and interests of the users, and recommendation information with higher accuracy can be predicted in candidate news.

Example 2

As shown in fig. 3, an embodiment of the present invention proposes a personalized news recommendation system based on a large language model based on embodiment 1, including:

The personalized news recommendation system based on the large language model provided in this embodiment has all the technical effects of the personalized news recommendation method based on the large language model described in embodiment 1.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The principles and embodiments of the present invention have been described in detail with reference to specific examples, which are provided to facilitate understanding of the method and core ideas of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Those of ordinary skill in the art will recognize that the embodiments described herein are for the purpose of aiding the reader in understanding the principles of the present invention and should be understood that the scope of the invention is not limited to such specific statements and embodiments. Those of ordinary skill in the art can make various other specific modifications and combinations from the teachings of the present disclosure without departing from the spirit thereof, and such modifications and combinations remain within the scope of the present disclosure.

Claims

1. A personalized news recommending method based on a large language model is characterized by comprising the following steps:

2. The personalized news recommending method based on the large language model according to claim 1, wherein the news coding features specifically comprise:

news category, news headline, and news body.

3. The personalized news recommendation method based on a large language model according to claim 2, wherein the user coding features specifically comprise:

web page records, clicked news, news related search request terms, and user system serial numbers.

4. The personalized news recommending method based on the large language model according to claim 3, wherein the news representation coding module specifically comprises:

5. The personalized news recommending method based on the large language model according to claim 4, wherein the user representation coding module specifically comprises:

6. The personalized news recommending method based on the large language model according to claim 5, wherein the click prediction module specifically comprises:

7. The personalized news recommending method based on the large language model according to claim 6, wherein the first word embedding layer and the second word embedding layer each comprise a first input layer, a hidden layer, an output layer and a position coding layer, and the processing method comprises the following steps:

where hit represents the output vector of the hidden layer,Wweights representing feature vectorsA heavy matrix, p, represents the dimension of the context sequence of feature vectors,x _i represent the firstiA feature vector;

Wherein,output vector representing output layer,/>The posterior distribution probability of the vocabulary is represented,w _j the representation comprises the firstjWeight of individual words->Representing weights containing 1 st to p-th words, exp representing a natural exponential function, and L representing the total number of words in an output vector of an output layer;

where y represents the final output vector,the representation position codes a representation vector,locrepresenting the position of a word in a sentence, k represents the position of the word vector in the vocabulary, dim_in represents the dimension of the word vector.

8. The personalized news recommendation method based on a large language model according to claim 7, wherein the first 3D convolutional neural network and the second 3D convolutional neural network each comprise a second input layer, a hard line layer, a convolutional layer and a downsampling layer;

wherein,expressed in the position [ (]l,m,n) On the firstuLayer numbervUnit value of convolution kernel on each feature map, +.>Represents the activation function, z represents the firstuThe number of sets of layer 1 feature maps connected to the current feature map, E _u Representing the height of the convolution kernel,F _u representing the width of the convolution kernel,G _u channel number representing convolution kernel, +.>Representing connection NozThe positioning of the individual feature graphs in the convolution kernele,f,g) Weights of channels +.>Expressed in the position [ (]l,m,n) Upper firstuThe-1 layer convolution kernel is connected to the first layerzAfter the feature map, the feature map is located at the nuclear positione,f,g) Unit value of>Representing the bias of the ith feature map of the u-th layer.

9. The large language model based personalized news recommendation method according to claim 8, wherein the first masked attention layer and the second masked attention layer each comprise an overlay layer, a normalization layer, a feed forward neural network, a self-attention layer and a masking layer;

the calculation method of the output feature vector of the self-attention layer comprises the following steps:

the computing method of the output characteristic vector of the mask layer comprises the following steps:

wherein,output feature vector representing mask layer, +. >Representing a sequence of masking operations,/->Representing a normalized weight matrix ++>Representation of the first pairiAnd (b)jThe individual matrices are spliced after passing through the self-attention layer,representation of the first pairiAnd (b)jThe individual matrixes are subjected to slicing and splicing operation, V _i Represents the ith content matrix, Q _i Represents the ith query matrix, K _i Representing the ith index matrix.

10. A personalized news recommendation system based on a large language model applying the method of any one of claims 1 to 9, comprising: