CN113343120A - Intelligent news recommendation system based on emotion protection - Google Patents

Intelligent news recommendation system based on emotion protection Download PDF

Info

Publication number
CN113343120A
CN113343120A CN202110606444.XA CN202110606444A CN113343120A CN 113343120 A CN113343120 A CN 113343120A CN 202110606444 A CN202110606444 A CN 202110606444A CN 113343120 A CN113343120 A CN 113343120A
Authority
CN
China
Prior art keywords
news
user
emotion
new
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110606444.XA
Other languages
Chinese (zh)
Inventor
刘嘉辉
杜金
仇化平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin University of Science and Technology
Original Assignee
Harbin University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin University of Science and Technology filed Critical Harbin University of Science and Technology
Priority to CN202110606444.XA priority Critical patent/CN113343120A/en
Publication of CN113343120A publication Critical patent/CN113343120A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Business, Economics & Management (AREA)
  • Evolutionary Computation (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Algebra (AREA)
  • Economics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides an intelligent news recommendation system based on emotion protection. The method comprises the following steps: 1. extracting news features and feature words of a text by using a BERT pre-training model, and constructing a news feature matrix through a news feature vector; 2. performing emotion filtering on the text information to establish an emotion grading model, and performing emotion grading on user comments, news titles and contents to distinguish the negative and positive degrees of the user comments, the news titles and the contents; 3. clustering news labels through a clustering algorithm, distributing weights to browsed news according to the comment emotion levels of users and the behavior time of the users, and constructing a user matrix by using user characteristic information; 4. predicting the emotion level of the user in the next time period according to the time sequence of the user emotion; 5. and a recommendation table is generated by calculating the similarity of the user and the news vector, the emotional state of the user is predicted, the news is recommended according to the proportion by a Bayesian method, and dynamic pushing is realized. The invention avoids negative energy and negative public opinion from damaging the user psychology and damaging the public safety of the society.

Description

Intelligent news recommendation system based on emotion protection
Technical Field
The invention belongs to the field of computer software, artificial intelligence and recommendation systems, and particularly relates to an intelligent news recommendation system based on emotion protection.
Background
At the beginning of the rise of the internet, data information on the network is less, and the interaction between people and the internet is not much, so that the website can meet the requirements only by displaying the information of the website to users in a classified manner. As the number of interactions between people and the internet increases, the information index increases, and the way in which users want to obtain data needed by themselves and view information in sequence is too slow, so that search engines become a common way for people to retrieve information. In addition, with the mobile internet, the development of the internet of things enriches the content and form of the internet, and leads to the continuous increase of data, thereby facing the era of data explosion.
With the improvement of computer processing capability and the development of internet communication technology, in the face of information data overload on the internet, more and more internet companies adopt a method of a recommendation system to recommend information to users. The recommendation system means that when a user without clear requirements accesses a service, the content of the service is overloaded to the user, the system cannot show all information to the user at one time, the content must be sequenced through a certain rule, the content sequenced by the conventional method is the content which is predicted to be most interesting by the user after analysis, and the recommendation information ranked in the front is shown to the user.
Generally, recommendation is performed by a recommendation system by collecting personal information of a user, such as the user's location, gender, age, occupation, etc., to label the user to indicate the user's personality, and collecting information about the content's attributes, such as title, category, time, etc. And then, establishing a model by using the operation data and the historical records of the user on the content, learning the interest preference of the user, and providing accurate personalized recommendation service for the user.
The traditional recommendation methods mainly include a collaborative filtering recommendation method, a content-based recommendation method, a deep learning and machine learning recommendation method and the like. The recommendation method based on collaborative filtering mainly depends on the scoring data of the user, migration of user interest is difficult to predict, and the recommendation method based on content is not high in recommendation accuracy rate under the condition of sparse data. Machine learning and deep learning which are started in recent years have the advantages that the recommendation effect is improved compared with the former two methods under the condition that the data volume is huge, an effective recommendation model cannot be selected under the condition that the data volume is small, and the cold start problem of a user still exists.
The current news pushing platform adopts a recommendation system to push news which is interesting to users. And the recommendation results generated by the above methods only consider whether the recommendation results are interesting to the user, but do not consider whether the recommendation results are contents which are not all acceptable to the user and are beneficial to the user. For example: when a certain user is very interested in certain news, the user pushes the relevant news every day, and the user opens the pushed news every time, so that the user demand is met only in the aspect of news pushing. However, the emotional change after the user finishes watching the news and the positive and negative emotions of the news contents themselves are ignored. Therefore, negative news may be seen every day by the user, thereby exacerbating the negative emotion of the user, thereby affecting the physical and mental health of the user and the public safety of the society, and even invisibly causing a tool for the conscious person to change the wind direction of the internet public opinion.
With the rapid development of the internet, the news of many emergency situations still causes panic in some people and changes of wind direction of internet public opinion. The above problem is being exposed in the news recommendation system, that is, the influence of the emotional change of the user and the recommendation result is not considered. Leading to a rapid distribution of negative and even unproven news over the network, where the consequences of panic have been gone without safeguarding.
Social media such as microblog and comment based on features such as instantaneity, interaction and diversification become a main way for network users to know news. Under the large environment, the social media arouses to provide a convenient theme selection and content customization way for the public to express and share the hot events, the comment data of the user is used as the basis of the emotion information of the user, the comment data is comprehensive and feedback of the user on cognition, attitude, tendency and behavior of the emergency relevant events, and the comment content of the user provides text corpora for comprehensively analyzing the emotion change trend of the user.
For emotion analysis, current emotion analysis methods mainly include an emotion dictionary based research method, a machine learning based research method, and the like. The emotion dictionary-based research method mainly comprises the steps of constructing an emotion dictionary, comparing emotion words in a text with words in the emotion dictionary, and finding out corresponding emotion tendencies. The machine learning-based method needs a large-scale corpus data set to learn for a computer, and simultaneously needs a large amount of manually labeled data to obtain the emotion of a text through learning.
Disclosure of Invention
Technical problem to be solved
To realize that news is recommended in a news push system in consideration of not only whether a user is interested but also changes in emotional tendency of the user. The invention designs the screening process of the news pushing. Firstly, in order to extract the subject words of an article to construct a news characteristic matrix, designing and utilizing BERT (bidirectional Encoder retrieval from transformers), namely a bidirectional Encoder Representation model based on transformers, preprocessing news titles, contents and user comments, extracting characteristic words by means of similarity calculation and accumulation of the characteristic words after word segmentation cleaning and other characteristic words in sequence, and simultaneously adding news characteristic information to construct the news characteristic matrix; secondly, in order to carry out emotion classification on news contents and user comments, introducing a BERT pre-training model for emotion classification, carrying out emotion classification on the news contents and the comments by converting a sequence of a hidden layer into a vector mode, carrying out emotion classification on the news contents and the comments by using a fuzzy clustering mode, clustering news tags by using a clustering algorithm, distributing weights to news browsed by a user according to the emotion values of the user and time factors of user behaviors, expressing user interest distribution according to the distributed news weights and the news tag classification, constructing a user matrix by using user characteristic information, and generating a user behavior time sequence according to a time sequence to predict the emotional state of the user in the next time period by using a time sequence model; and finally, generating a recommendation list by calculating the similarity of the user vector and the news vector, and classifying the news in the recommendation list according to the emotion of the user predicted by the time series model to divide the proportion of all levels of emotion news.
(II) technical scheme
In order to quickly screen out a news list preferred by each user from a large amount of news and user information and predict the emotional state of the user on the information, the invention aims to provide an intelligent news pushing system based on emotional protection, which comprises:
(1) the news characteristic matrix module is used for extracting news characteristic information by using a BERT pre-training model to construct a news characteristic matrix;
(2) the emotion grading module is used for establishing an emotion grading model to carry out emotion grading processing on the user comments, the news titles and the content;
(3) the user characteristic matrix module is used for clustering the news labels through a clustering algorithm, distributing weight to news browsed by a user according to the comment emotion level and the behavior time of the user, and constructing a user matrix by using user characteristic information;
(4) the user emotion prediction module is used for marking the user emotion tendency according to the time sequence to construct a model and predicting the emotion level of the user in the next time period;
(5) and the news pushing module is used for generating a recommendation list by calculating the similarity between the user vector and the news vector, and setting the proportion of news at each level in the recommended news according to the predicted emotional state of the user and a Bayesian method.
A news intelligent pushing system based on emotion protection stores news titles, news contents and user comments in news websites in a database of a computer in a text mode, and because data in a natural language mode cannot be subjected to numerical calculation, a BERT model is adopted in a news characteristic matrix building module to pre-train to generate corresponding word vectors, keywords are further extracted, and then characteristic information of news, such as news release time and release authors, is added to build a matrix. NEWS is expressed by a vector, NEWS is (n _ param _1, n _ param _2, …, n _ param _ p), and parameters in the vector indicate NEWS attributes. The news characteristic matrix construction module comprises the following specific steps:
step1_1, loading news content, news headlines and user comments in the system as corpus data, cleaning the corpus data, and initializing BERT pre-training model parameters;
step1_2, converting the text into a vector form through a BERT model, removing non-characteristic words in the text content by comparing the text content with a prepared non-characteristic vocabulary, and segmenting the text by taking the non-characteristic words as boundaries;
step1_3, sequentially calculating Euclidean distances between the vector corresponding to each word after word segmentation and all the other words, accumulating, and taking the top N items with the highest results as the characteristic words corresponding to the news;
step1_4, digitally representing the characteristic information of the news, merging the characteristic information with word vectors of characteristic words of the news, filling the merged information into the news vectors, and storing the news vectors into a news matrix;
and finishing the construction process description of the news characteristic matrix.
According to the intelligent news pushing system based on emotion protection, a user of a news website does not have a scoring mechanism like an e-commerce website or a movie website in evaluating news content, so that comments of the user are data capable of reflecting user attitudes most, and an emotion grading model is established to grade emotion of the comments of the user. The emotion grading module comprises the following specific processes:
step2_1, respectively calculating the mean value and the maximum value of the vector corresponding to the whole news or comment text after being trained by the BERT model through the dimension along the sequence length, wherein the sequence length represents the length of the whole text sequence, and then combining the sequence length and the length to form a vector text _ sum, and the text _ sum vector represents the content of the whole news or comment.
Step2_2, taking k words with the most positive emotional tendency, and averaging vectors corresponding to the k words to generate an emotional standard vector emotion _ normal;
step2_3, subtracting the text _ sum corresponding to each NEWs and comment from the emotion _ normal to filter the influence of non-emotional factors in the text _ sum vector, and assigning the result to a vector NEW _ e to represent emotional characteristics;
step2_4, establishing an emotion classification model to perform classification processing on the emotion feature vector NEW _ e, wherein the classification processing is specifically as follows:
step2_4_1, randomly selecting emotion feature vectors NEW _ e _ c1, NEW _ e _ c2, NEW _ e _ c3, NEW _ e _ c4 and NEW _ e _ c5 as positive emotions, comparing positive emotions with neutral emotions, comparing negative emotions with the sum of the category centers of the negative emotions, dividing samples into five categories, and expressing the distance d _ ij between the sample NEW _ ei and the category center NEW _ e _ cj by an exponential similarity coefficient, wherein the specific formula is as follows:
Figure BDA0003089610010000031
wherein s isk=1/n*Σ(NEW_eik-NEW_k)2
Figure BDA0003089610010000032
Wherein n is the number of samples and s is the feature dimension of the sample vector;
step2_4_2, obtaining a fuzzy classification matrix U through a fuzzy C-means clustering algorithm, wherein a solving formula of the U is as follows:
Figure BDA0003089610010000033
wherein u _ ij represents the membership degree of the sample i to the class j, m is a fuzzy coefficient, and c is the class number;
step2_4_3, initializing a parameter mu of the likelihood c-means clustering, wherein the value of the mu is
Figure BDA0003089610010000034
Step2_4_4, the objective function and constraints of the likelihood C-means clustering are as follows:
Figure BDA0003089610010000035
Figure BDA0003089610010000037
step2_4_5, deriving the membership function and the class center according to the following formula:
u_ij=(1+(d_ij2/μi1/(m-1))-1
Figure BDA0003089610010000036
step2_4_6, setting a threshold value epsilon and a fuzzy coefficient m, and stopping iteration and outputting an optimal fuzzy classification matrix U and a class center matrix C by the algorithm when | | delta C _ i | < epsilon is met;
step2_4_7, ranking according to Euclidean distance difference d _ j between the category vector in the central matrix C and the emotion standard vector emotion _ normal;
and Step2_5, ranking the emotion levels according to the size of d _ j, wherein the minimum is the fifth level, namely the positive emotion level, and the maximum is the first level, namely the negative emotion level. The membership degree of each news and comment belonging to the emotion grade can be obtained through the optimal fuzzy classification matrix, namely the emotion grade can be judged according to the numerical value of the membership degree;
and finishing the description of the emotion grading module.
A news intelligent pushing system based on emotion protection is characterized in that when user characteristics are mined, news labels are clustered through a clustering algorithm to express user interests, weights are distributed to news browsed by users according to the emotion levels of the users and the time factors of user behaviors, user interest distribution is expressed according to the distributed news weights, a user matrix is built by utilizing user characteristic information, and a user characteristic matrix building module is specifically as follows:
step3_1, determining the radius r of a sliding window, taking a randomly selected NEWs tag vector NEW _ L1 as a circle with the center vector radius r as the sliding window, sequentially calculating Euclidean distances L between the rest NEWs tag vectors and a center point NEW _ L1, and calculating that the NEWs tag vectors with the distance from NEW _ L1 being less than or equal to r are marked to a set M, namely the points belong to a cluster c 1;
step3_2, calculating the offset vector N from the center vector NEW _ L1 to all elements in the set in turniObtaining an offset vector N ═ N1+N2+…+NN
Step3_3, the center vector NEW _ Li moves by a distance of | N | in the density increasing direction;
step3_4, repeating the above operations until the value of the offset is smaller than the threshold value K, and marking the central vector NEW _ Li at the moment; and continuing to repeat the above operations until all points are classified;
step3_5, marking the frequency of each news label vector according to each class, and taking the class with the maximum frequency as the class kind to which the news label vector belongs;
step3_6, taking the latest browsing time of the user as a time stamp, subtracting the browsing time of the rest news from the time stamp, dividing the browsing time by the time precision, and simultaneously normalizing to obtain the time weight corresponding to each news; for example: the time stamp is t, the time for browsing the news i is ti, and the time weight Pti is 1/(1+ ((t-ti)/m)), wherein m is the time precision;
step3_7, assigning a weight Wi to news i browsed by the user, wherein Wi is Pti and emotoii, Pti is a time weight, and emotoii is an emotion level of the comment of the user;
step3_8, accumulating and summing the interest degrees of the users according to the NEWs feature word types to obtain the total weight of the interest of the users to each NEWs type, wherein the NEWs weight Wi is Pti emootini, so that the influence of negative emotions is also taken into consideration when the weights are accumulated and summed, and the interest of the users can be more comprehensively represented, and the end User interest User _ label _1 is (W1 NEW _ S1_ k1+ W2 NEW _ S2_ k1+ … + Wn NEW _ Sn _ K1), …, User _ label _ n is (W1 NEW _ S1_ kn + W2 NEW _ S2 kn + … + Wn _ Sn _ kN), wherein the NEWs weight of the User is represented by the NEWs category WLab, and the interest degree of the User is represented by the NEWs category WLab _ Sn;
step3_9, weighting the interest of the User on n types of news into an n-dimensional vector User _ label, [ User _ label _1, User _ label _2, …, User _ label _ n ]; wherein News _ label _ n represents the similarity between the News label belonging to the nth interest classification and the class center vector, and if the News label does not belong to the nth interest classification, the nth dimension value is 0;
step3_10, merging the User interest vector User _ label and the User characteristic information, filling the merged User interest vector User _ label into a User vector, storing the User vector into a User matrix, and finishing the construction of the User characteristic matrix;
and finishing the process description of the user characteristic matrix construction module.
A news intelligent pushing system based on emotion protection is used for marking user emotion trends according to time sequences and predicting the emotion states of users in the next time period in order to consider the influence of user emotion state factors in a news recommending system. Therefore, a time sequence model is adopted to predict the emotion of the user, firstly, the historical behavior data of the user is utilized to establish a user time sequence, the user time sequence is a 2-n dimensional matrix which takes time as an index and is arranged according to the time sequence, and n is the total news amount browsed by the user; the behavior of the user not commented on in the user time sequence is supplemented by solving the user emotion bias Ud and the news self emotion value Nd according to the proportion w, wherein the w is a number between 0 and 1. Finally, predicting the emotional state emotion (t) of the user in the next time period by establishing a time series model, wherein t is the time of the next period; the user emotion prediction module specifically comprises the following steps:
and Step4_1, reading the user behavior data to convert the time data and creating a time sequence. Using the user behavior time as an index and the user emotion information as content, namely sequence ═ ((time _1, animation _1), (time _2, animation _2), …, (time _ n, animation _ n));
step4_2, supplementing blank items of the time sequence, and calculating the emotion bias Ud of the user, wherein the Ud is (emotion _1+ emotion _2+ … + emotion _ n)/m, wherein m is the number of news with emotion comments of the user, and the blank items in the sequence are supplemented by the Ud;
and Step4_3, judging whether the created time sequence is a stable sequence or not through a unit root test, and if the unit root does not exist, determining the time sequence to be the stable sequence, namely, the value of p is greater than the significance level (0.05), wherein p is the number of sequence autoregressive terms. Otherwise, carrying out differential operation on the created time sequence to generate a new time sequence;
step4_4, performing first-order difference operation on the time sequence in the difference operation, repeatedly performing stationarity check, and proving that the time sequence subjected to the first-order difference operation is a stable sequence through the stationarity check, wherein the first-order difference operation process is as follows: sequence1 ═ time _1, animation _1), (time _2, animation _ 2-animation _1), …, (time _ n, animation _ n-1));
step4_5, carrying out white noise test on the time sequence, and judging whether the p value is less than 0.05 through LB statistic test; if the difference is smaller than the threshold value, the sequence is proved to be a non-white noise sequence, otherwise, high-order differential operation is carried out;
step4_6, initializing m, p, q, a, b values, and expressing the emotion level of the user in the next time period as follows: the method comprises the following steps of (1) detecting the number of a plurality of events in a sample, wherein the events _ n +1 is m + a _ events _1+, + a _ events _ n +1-p + b _ events _1+, + b _ events _ n +1-q, wherein m is a constant, q is the number of moving average terms, and a and b are autoregressive weight and moving average weight respectively.
Step4_7, performing parameter optimization by a minimum bic criterion method, wherein bic ═ ln (n) (k-2 ln (L)), where k is the number of parameters, n is the number of samples, and L is a likelihood function, bringing the optimal parameter result into the above formula to establish a final time series model, and predicting the emotion level of the user in the next time period through the model.
And finishing the description of the emotion prediction process of the user.
When recommending news to a user, a recommendation list is generated by calculating the similarity between a user vector and a news vector, and when calculating the similarity between the user vector and the news vector, a similarity sim _ g (U, N) between the user interest and news content is calculated by using a user matrix and a news matrix to generate a news recommendation list to be selected, which is specifically as follows:
step5_1, selecting NEWS vectors which are not browsed by the user from the NEWS matrix according to the user id, and storing the NEWS vectors into a list to be processed, namely waitlist (NEWS _1, NEWS _2, …, NEWS _ N);
step5_2, calculating the content similarity sim _ g (U, N) between the news vector and the user vector in waitlist: sim _ g (U, N) [ (U1-N1) ^2+ (U2-N2) ^2+ … + (Uq-Nq) ^2] ^ 1/2;
step5_3, normalizing the content similarity sim _ g (U, N) between the user vector and the news vector: sim _ g (U, N) ═ 1/(1+ sim _ g (U, N));
step5_4, sorting the calculated sim _ g (U, N) according to the sequence from big to small and storing the corresponding news id into a to-be-selected news recommendation list according to the sequence;
step5_5, further, in an intelligent news pushing system based on emotion protection, when dynamically pushing news, setting the proportion of each level of news in a news recommendation list according to the predicted emotional state of a user through a news proportion division algorithm, generating the news recommendation list and pushing the news recommendation list to the user, specifically as follows:
step5_5_1, dividing the news sequence of each user according to T time intervals, wherein the divided N _ i represents the news sequence N _ i corresponding to the user i (T _1, T _2, …, T _ T), wherein T _ T is (News _ e1+ News _ e2+ … + News _ en)/n, News _ en represents the emotion grade value corresponding to the News browsed by the user, T _ T represents the emotion grade average value of the News in the time period T, similarly, the comment sequence of each user is divided according to the time interval T, the divided U _ ie represents the comment sequence corresponding to the user i, U _ ie is (T _ e1, T _ e2, …, T _ et), T _ et is (observation _ te1+ observation _ te2+ … + observation _ ten)/n, wherein T _ et represents the emotion grade average value of the user i in the time period T, and emotion _ ten represents the emotion grade value of the comment of the user i;
step5_5_2, initializing a user emotion level probability P (U _ ei) ═ E _ im/E _ N and a news emotion level probability P (N _ ei) ═ N _ im/N _ N, wherein E _ im represents the number of times of ith level emotion in the whole user comment sequence, E _ N represents the number of times of whole emotion level in the whole user comment sequence, N _ im represents the number of times of ith level emotion news in the whole user news sequence, and N _ N represents the number of times of whole emotion level news in the whole user news sequence;
step5_5_3, initializing the probability P (N _ ej | U _ ei) that the browsing news is in j level under the condition that the emotion of the known user is in i level as N _ j/U _ i, wherein U _ i represents the total number that the emotion of the user in all user sequences is in i level, and N _ j is the total number that the emotion of the news in all user sequences is in j level;
step5_5_4, calculating an emotion probability P (U _ ei) of the user i as E _ i _ m/E _ i _ N, a news emotion level probability P (N _ ei) of the user i as N _ i _ m/N _ i _ N, if P (U _ ei) is 0, making P (U _ ei) as P (U _ E), and if P (N _ ei) is 0, making P (N _ ei) as P (N _ E);
step5_5_5, calculate P (N _ ej _ k | U _ ei _ k), i.e. the probability that the browsed news is at the j-th level when the emotion of user k is known to be at the i-level. Taking all news sequences when the emotion of a user k is in an i level, setting the emotion value of the news emotion in a j level to be 1 in the news sequences, setting the rest to be 0, wherein the sample distribution at the moment meets Bernoulli distribution, and the function of probability distribution is P (N _ ej _ k | U _ ei _ k)NEW_emotion*(1-P(N_ej_k|U_ei_k))1-NEW_emotionThus, the likelihood function is described by the following equation: l (P) ═ P (N _ ej _ k | U _ ei _ k)NEW_emotion_1*(1-P(N_ej_k|U_ei_k))1-NEW_emotion_1*
P(N_ej_k|U_ei_k)NEW_emotion_2*(1-P(N_ej_k|U_ei_k))1-NEW_emotion_2*..*
P(N_ej_k|U_ei_k)NEW_emotion_n*(1-P(N_ej_k|U_ei_k))1-NEW_emotion_n
The maximum likelihood estimate of P (N _ ej | U _ ei) is obtained by the above formula:
P(N_ej_k|U_ei_k)=(NEW_emotion_1+NEW_emotion_2+..+NEW_emotion_n)/n,
if P (N _ ej _ k | U _ ei _ k) ═ 0, let P (N _ ej _ k | U _ ei _ k) ═ P (N _ ej | U _ ei);
step5_5_6, calculating P (U _ ei _ k | N _ ej _ k) by a Bayesian method, namely, in the case that the user k browses news with emotion level j, the probability that the user emotion level is i, and the calculation formula of P (U _ ei _ k | N _ ej _ k) is as follows:
P(U_ei_k|N_ej_k)=P(N_ej_k|U_ei_k)*P(U_ei)/P(N_ej);
step5_5_7, which is to take the emotion T _ et of the next time period of the user predicted by the time series model, wherein the proportion of each emotion news of the next time period of the dynamic pushing is P (T _ et +1| N _ e1): P (T _ et +1| N _ e2): P (T _ et +1| N _ e3): P (T _ et +1| N _ e4): P (T _ et +1| N _ e5), wherein T _ et +1 represents the emotion state which is one level higher than the predicted emotion T _ et of the user, and if T _ et is 5, T _ et +1 is T _ et which enables the user to keep the most active emotion;
step5_5_8, when user k generates a new behavior sequence T _ T +1, the iterative update is performed by:
firstly jumping to Step _5, and calculating P (N _ ej _ k | U _ ei _ k) after adding a new behavior sequence of a user k;
secondly, P (U _ ei1) is calculated, and P (U _ ei1) is the probability that the emotion of the user comment in the new behavior sequence of the user is in the i level;
and finally, updating the value of P (U _ ei _ k | N _ ej _ k) according to a formula P (U _ ei _ k | N _ ej _ k) ═ P (U _ ei _ k | N _ ej _ k)/P (U _ ei1), and jumping to Step _7 for recommendation.
Step5_6, sequentially selecting N news from the news recommendation list to be selected according to the proportion divided by the news proportion division algorithm, and loading the N news into the news recommendation list, wherein the number of the news of each emotion level is Ni, and Ni is N × P (T _ et +1| N _ ei)/((P (T _ et +1| N _ e1) + P (T _ et +1| N _ e2) + P (T _ et +1| N _ e3) + P (T _ et +1| N _ e4) + P (T _ et +1| N _ e 5));
the news dynamic pushing process is described;
(III) advantageous effects
The invention has the beneficial effects that: the data can be conveniently calculated and modeled by converting characters into vectors of digital expression through natural language processing on user comments and news titles and contents. The emotional analysis model is established to effectively reflect the emotional changes of the user and the social influence brought by news, and the emotional degree of the user reaction can be timely and accurately monitored. And the migration of the user interest can be more accurately predicted by adding the influence factor of the time factor, so that more accurate personalized recommendation can be conveniently carried out on the user.
The time series model added with the time factors can reflect the interest migration of the user, can accurately grasp the emotion level and the emotion trend of the user, can avoid the user from falling into low emotion, and can inhibit the user from generating bad emotion. The method avoids public opinion news which greatly affects users from damaging the psychology of the users, and avoids public opinions which cause negative energy from being spread too fast on the network to cause the damage of public order and the public safety of the society.
Drawings
FIG. 1 is a block flow diagram of a news intelligent recommendation system based on emotion protection.
FIG. 2 is a flow chart of an implementation of the intelligent news recommendation system based on emotion protection.
Detailed Description
Embodiments of the present invention are described in further detail below with reference to the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
Example 1: and constructing a news characteristic matrix.
Step1_1, loading news content, news headlines and user comments in the system as corpus data, cleaning the corpus data, and initializing BERT pre-training model parameters;
step1_2, converting the text into a vector form through a BERT model, removing non-characteristic words in the text content by comparing the text content with a prepared non-characteristic vocabulary, and segmenting the text by taking the non-characteristic words as boundaries;
step1_3, sequentially calculating Euclidean distances between the vector corresponding to each word after word segmentation and all the other words, accumulating, and taking the top 2 items with the highest results as the characteristic words corresponding to the news;
setting the vector after word segmentation as follows:
word1=[0.25,0.32,0.18,…,0.67];
word2=[0.35,0.64,0.37,…,0.82];
word3=[0.25,0.32,0.15,…,0.66];
euclidean distance sum of word1 with word2 and word 3: word1_ d ═ ((word1-word2)2)1/2+((word1-word3)2)1/2
Similarly, word1 is calculated to be 2_ d, word3_ d, word1_ d is set to be 0.82, word2_ d is set to be 0.79, and word3_ d is set to be 0.66, so that word1 and word2 are feature words of the news;
step1_4, digitally representing the characteristic information of the news, merging the characteristic information with word vectors of characteristic words of the news, filling the merged information into the news vectors, and storing the news vectors into a news matrix;
setting news characteristic information including a publishing company, a news source and an author, wherein the publishing company is represented as 0.11, the news source is represented as 0.12, and the author is represented as 0.13;
the merged NEWS vector NEWS is [0.11,0.12,0.13,0.25,0.32, …,0.66 ].
Example 2: and (4) emotion grading.
Step2_1, randomly selecting emotion feature vectors as positive emotions, comparing the positive emotions with neutral emotions, comparing the negative emotions with the negative emotion, dividing samples into five types, and setting the feature vectors as follows:
NEW_e_c1=[0.52,0.35,…,0.68];
NEW_e_c2=[0.62,0.38,…,0.82];
NEW_e_c3=[0.18,0.97,…,0.98];
NEW_e_c4=[0.27,0.48,…,0.64];
NEW_e_c5=[0.72,0.16,…,0.23];
sample NEW _ e1 is set to [0.93,0.28, …,0.45 ═ b]Distance from the class center NEW _ e _ c1 with a vector dimension of 10
Figure BDA0003089610010000071
Wherein
Figure BDA0003089610010000072
Wherein n is the number of samples;
step2_2, obtaining a fuzzy classification matrix U through a fuzzy C-means clustering algorithm, wherein a solving formula of the U is as follows:
Figure BDA0003089610010000073
wherein u _ ij represents the membership degree of the sample i to the class j, m is a fuzzy coefficient, and c is the class number;
step2_3, initializing parameters μ, μ of likelihood c-means clusteringTake a value of
Figure BDA0003089610010000074
Step2_4, the objective function and constraints of the likelihood C-means clustering are as follows:
Figure BDA0003089610010000075
Figure BDA0003089610010000076
step2_5, deriving the membership function and the class center according to the following formula:
u_ij=(1+(d_ij2/μi1/(m-1))-1
Figure BDA0003089610010000077
step2_6, taking a threshold value epsilon as 0.001 and a fuzzy coefficient m as 3, and stopping iteration and outputting an optimal fuzzy classification matrix U and a class center matrix C when the threshold value epsilon and the fuzzy coefficient m meet | | | | Δ C _ i | < epsilon;
step2_7, ranking according to Euclidean distance difference d _ j between the category vector in the central matrix C and the emotion standard vector emotion _ normal;
set emotion _ normal to [0.52,0.39, …,0.71]After iteration, NEW _ e _ c1 ═ 0.55,0.42, …,0.68]Then d _1 ═ ((0.55-0.52)2+(0.42-0.39)2+…+(0.71-0.68)2)1/2
And ranking the emotion levels according to the size of d _ j, wherein the smallest level is the fifth level, namely the positive emotion level, and the largest level is the first level, namely the negative emotion level.
Setting a certain vector in U as (0.5,0.2,0.1,0.1,0.1), indicating that the comment has a probability of 0.5 belonging to a positive emotion category, a probability of 0.2 belonging to a positive emotion category, a probability of 0.1 belonging to a neutral emotion category, a probability of 0.1 belonging to a negative emotion category and a probability of 0.1 belonging to a negative emotion category, and dividing the comment into positive emotions, which correspond to an emotion grade value of 5; when a certain vector is (0.2,0.2,0.2,0.2,0.2, 0.2), it indicates that the probability of belonging to each emotion category is the same, and it is classified into a neutral emotion category, and its corresponding emotion level value is 3.
Example 3: and constructing a user characteristic matrix.
Step3_1, wherein r is 2, a randomly selected NEWs tag vector NEW _ L1 is taken as a circular shape with a central vector radius of 2 as a sliding window, Euclidean distances L between all NEWs tag vectors and a central point NEW _ L1 are sequentially calculated, and NEWs tag vectors with distances less than or equal to r from NEW _ L1 are calculated and marked to a set M, namely the points belong to a cluster c 1;
step3_2, next calculate the offset vector N from the center vector NEW _ L1 to all elements in the setiObtaining an offset vector N ═ N1+N2+…+NN
Setting offset vector N1 ═ 1.0,2.0, …,1.0], N2 ═ 2.0,2.0, …,3.0], …, Nn ═ 3.0,4.0, …,3.0], then N ═ 6.0,8.0, 7.0;
step3_3, the center vector NEW _ Li moves in the density increasing direction (6.0)2+8.02+…+7.02)1/2The distance of (d);
step3_4, which is performed until the offset is less than the threshold 5, marks the center vector NEW _ Li at this time. And continuing to repeat the above operations until all points are classified;
step3_5, selecting the class with the maximum frequency as the class kind to which the point belongs according to the frequency of each class to each news vector mark;
step3_6, taking the latest browsing time of the user as a time stamp, subtracting the browsing time of the rest news from the time stamp, dividing the browsing time by the time precision, and simultaneously normalizing to obtain the time weight corresponding to each news;
setting the time stamp t to 1620392811, i.e. the time t1 to 1610392415 and m to 86400 for browsing news 1, and then setting the time weight Pt1 to 1/(1+ ((1620392811-1610392415)/86400)) to 0.008;
step3_7, assigning a weight Wi to news i browsed by the user, where Wi is Pti emotoii, and setting the emotional rank value of news 1 to be 4, and then W1 is 0.008 × 4 to 0.032;
step3_8, accumulating and summing the interest degrees of the users according to the NEWs feature word types to obtain the total weight of the interest of the users to each NEWs type, and setting that the number of the NEWs browsed by the User 1 is 3, so that the interest degree of the User 1 to the first type NEWs is User _ label _1 (W1 NEW _ S1_ k1+ W2 NEW _ S2_ k1+ W3 NEW _ Sn _ k1), wherein Wn represents the weight of the nth NEWs, and NEW _ Sn _ kn represents the similarity of the NEWs to the 1 st type NEWs;
step3_9, weighting the interest of the User on n types of news into an n-dimensional vector User _ label, [ User _ label _1, User _ label _2, …, User _ label _ n ];
step3_10, merging the interest vector User _ label and the User characteristic information, filling the merged User vector into a User vector, and storing the User vector into a User matrix;
the setting user characteristic information includes the sex of the user, the age of the user, the user is represented as 0.2, the age of the user is represented as 0.18,
the combined USER vector USER is [0.2,0.18,0.032 …,0.085 ].
Example 4: and (4) predicting user emotion.
Step4_1, using the time sequence as an index and the user emotion level value as content, creating a user time sequence ((time _1, animation _1), (time _2, animation _2), …, (time _ n, animation _ n));
setting a user time sequence as ((20210105,1), (20210114,2), …, (20210510,5)), wherein the user browsing behaviors are 20 in total, and the news with emotional comments is 15;
step4_2, supplementing blank items of the time sequence, calculating user emotion bias Ud, and supplementing the blank items in the sequence with Ud, wherein Ud is (1+2+ … + 5)/15;
step4_3, the time series created is checked by unit root whether it is a stationary sequence, if there is no unit root, it is a stationary sequence, i.e. the p value is greater than the significance level (0.05), where p is the number of sequence autoregressive terms. Otherwise, carrying out differential operation on the created time sequence to generate a new time sequence;
step4_4, performing first-order difference operation on the time sequence in the difference operation, performing stationarity check again, and proving that the time sequence subjected to the first-order difference operation is a stable sequence through the stationarity check, wherein the sequence subjected to the first-order difference operation is as follows: sequence1 ═ ((20210105,1), (20210114,1), …, (20210510, 4));
step4_5, carrying out white noise test on the time sequence, and judging whether the p value is less than 0.05 through LB statistic test; if the difference is smaller than the threshold value, the sequence is proved to be a non-white noise sequence, otherwise, high-order differential operation is carried out;
step4 — 6, initializing m to 1, p to 2, q to 5, a to 0.6, b to 0.4, and expressing the emotion level of the user in the next time period as:
emotion_n+1=1+0.6*emotion_1+...+0.6*emotion_n+1-2+0.4*emotion_1+...+0.4*emotion_n+1-5;
step4_7, performing parameter optimization by a minimum bic rule method, and setting m to 1, p to 3, q to 4, a to 0.4, and b to 0.6 after optimization, so that the emotion level emotion _ n +1 of the user in the next time period is 1+0.4 emotion _1+, +0.4 emotion _ n +1-3+0.6 emotion _1+, +0.6 emotion _ n +1 +;
example 5: and dividing the news into proportions.
Step5_1, dividing the news sequence of user 1 at 72-hour intervals, and setting the results of the divided news sequence N _1 and news subsequence T _ i as follows:
N_1=(1.33,2.66,3.66);
T_1=(1,2,…,5);
T_2=(2,4,…,1);
T_3=(1,4,…,3);
similarly, the comment sequence of each user is divided according to the time interval of 72 hours, and the results of the divided user comment sequence U _1e and comment subsequence T _ ei are set as follows:
U_1e=(3.66,3.42,2.42),;
T_e1=(3,4,…,5);
T_e2=(3,2,…,1);
T_e3=(3,3,…,2);
step5_2, the emotion level probability P (U _ ei) and the news emotion level probability P (N _ ei) of the initialization user are set as follows:
P(U_e1)=0.1,P(U_e2)=0.2,P(U_e3)=0.1,P(U_e4)=0.3,P(U_e5)=0.3;
P(N_e1)=0.1,P(N_e1)=0.1,P(N_e1)=0.3,P(N_e1)=0.2,P(N_e1)=0.3;
step5_2, setting the probability P (N _ ej | U _ ei) of j-level browsing news under the condition of initializing the known user emotion to i-level as N _ j/U _ i, for example, P (N _ e1| U _ e1) as 0.2, and representing that the probability of browsing the news with emotion level 1 is 0.2 when the user emotion level is 1;
step5_4, calculating an emotion probability P (U _ ei) of the user i as E _ i _ m/E _ i _ N, a news emotion level probability P (N _ ei) of the user i as N _ i _ m/N _ i _ N, making P (U _ ei) as P (U _ E) if P (U _ ei) is 0, and making P (N _ ei) as P (N _ E) if P (N _ ei) is 0;
step5_5, calculate P (N _ ej _ k | U _ ei _ k), i.e. the probability that the browsed news is the j-th level when the emotion of user k is known to be i level. Taking all news sequences when the emotion of a user k is in an i level, setting the emotion value of the news emotion in a j level to be 1 in the news sequences, setting the rest to be 0, wherein the sample distribution at the moment meets Bernoulli distribution, and the function of probability distribution is P (N _ ej _ k | U _ ei _ k)NEW_emotion*(1-P(N_ej_k|U_ei_k))1-NEW_emotionThe likelihood function is thus described by the following equation:
L(P)=P(N_ej_k|U_ei_k)NEW_emotion_1*(1-P(N_ej_k|U_ei_k))1-NEW_emotion_1*P(N_ej_k|U_ei_k)NEW_emotion_2*(1-P(N_ej_k|U_ei_k))1-NEW_emotion_2*..*P(N_ej_k|U_ei_k)NEW_emotion_n*(1-P(N_ej_k|U_ei_k))1-NEW_emotion_n
the maximum likelihood estimation value of P (N _ ej | U _ ei) obtained by the above formula is P (N _ ej _ k | U _ ei _ k) ═ NEW _ observation _1+ NEW _ observation _2+. + NEW _ observation _ N)/N, and if P (N _ ej _ k | U _ ei _ k) ═ 0, P (N _ ej _ k | U _ ei _ k) ═ P (N _ ej | U _ ei);
step5_6, calculating P (U _ ei _ k | N _ ej _ k) by a Bayesian method, namely, the probability that the emotion level of the user is i when the user k browses news with the emotion level j, wherein the calculation formula of P (U _ ei _ k | N _ ej _ k) is as follows:
P(U_ei_k|N_ej_k)=P(N_ej_k|U_ei_k)*P(U_ei)/P(N_ej);
step5_7, setting the emotion of the user in the next time period predicted by the time series to be 4, and setting the proportion of each emotion news in the next time period dynamically pushed to be P (5|1): P (5|2): P (5|3): P (5|4): P (5| 5): 1:2:2: 4;
step5_8, when user k generates a new behavior sequence T _ T +1, the iterative update is performed by:
firstly jumping to Step _5, and calculating P (N _ ej _ k | U _ ei _ k) after adding a new behavior sequence of a user k;
secondly, P (U _ ei1) is calculated, and P (U _ ei1) is the probability that the emotion of the user comment in the new behavior sequence of the user is in the i level;
finally, the value of P (U _ ei _ k | N _ ej _ k) is updated according to the formula P (U _ ei _ k | N _ ej _ k) ═ P (U _ ei _ k | N _ ej _ k)/P (U _ ei1), and then the recommendation is carried out by jumping to Step _ 7.
According to the proportion divided by the news proportion division algorithm, 20 news are sequentially selected from the news recommendation list to be selected and loaded into the news recommendation list, and the number of the news of each emotion level is as follows:
N1=20*1/(1+1+2+2+4)=2;
N2=20*1/(1+1+2+2+4)=2;
N3=20*2/(1+1+2+2+4)=4;
N4=20*2/(1+1+2+2+4)=4;
N5=20*4/(1+1+2+2+4)=8;
finally, it should be noted that: the above examples are intended only to illustrate the technical process of the invention, and not to limit it; although the invention has been described in detail with reference to the foregoing examples, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing examples can be modified, or some technical features can be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims (1)

1. Intelligent news recommendation system based on emotion protection, its characterized in that:
(1) a news characteristic matrix module: extracting news characteristic information through a BERT pre-training model to construct a news characteristic matrix;
(2) and an emotion grading module: establishing an emotion grading model to carry out emotion grading processing on user comments, news titles and contents;
(3) a user characteristic matrix module: clustering news labels through a clustering algorithm, distributing weight to news browsed by a user according to the comment emotion level of the user and the behavior time of the user, and constructing a user characteristic matrix through user characteristic information;
(4) the user emotion prediction module: marking the emotion tendency of the user according to the time sequence to establish a model and predicting the emotion level of the user in the next time period;
(5) the news pushing module: generating a recommendation list by calculating the similarity of a user vector and a news vector, and setting the proportion of each level of news in the recommended news according to a Bayesian method through the predicted emotional state of the user;
the intelligent news recommendation system based on emotion protection as claimed in claim 1, wherein:
the news characteristic matrix module comprises:
step1_1, loading news content, news headlines and user comments in the system to form corpus data, cleaning the corpus data, and initializing BERT pre-training model parameters;
step1_2, converting the text into a vector form through a BERT model, eliminating non-characteristic words in the text content by comparing the text content with a preset non-characteristic vocabulary, and segmenting the text by taking the non-characteristic words as boundaries;
step1_3, sequentially calculating Euclidean distances between the vector corresponding to each word after word segmentation and all the other words, accumulating, and setting the top N items with the highest results as feature words corresponding to the news;
step1_4, performing digital representation on the characteristic information of the news, merging the characteristic information with word vectors of characteristic words of the news, filling the merged information into the news vectors, and storing the news vectors in a news matrix;
completing the description of the construction process of the news characteristic matrix;
the emotion grading module comprises:
step2_1, respectively calculating the mean value and the maximum value of vectors corresponding to news and comment texts trained by a BERT model according to the dimension of sequence length, wherein the sequence length represents the length of the whole text sequence and is combined into a vector text _ sum, and the text _ sum vector represents the content of the whole news and comment;
step2_2, selecting k words with the most positive emotional tendency, and calculating the mean value of vectors corresponding to the k words to form an emotional standard vector emotion _ normal;
step2_3, calculating the difference value between the text _ sum and the emotion _ normal corresponding to each NEWs and comment to eliminate the influence of non-emotional factors in the text _ sum vector, and assigning the result to a vector NEW _ e as an emotional characteristic;
step2_4, establishing an emotion classification model to realize classification processing on an emotion feature vector NEW _ e;
the concrete description is as follows:
step2_4_1, randomly selecting emotion feature vectors NEW _ e _ c1, NEW _ e _ c2, NEW _ e _ c3, NEW _ e _ c4 and NEW _ e _ c5 as positive emotions, comparing positive emotions with neutral emotions, comparing negative emotions with the sum of the category centers of the negative emotions, dividing samples into five categories, and expressing the distance d _ ij between the sample NEW _ ei and the category center NEW _ e _ cj by an exponential similarity coefficient, wherein the specific formula is defined as:
Figure FDA0003089609000000011
wherein
Figure FDA0003089609000000012
Wherein n is the number of samples and s is the feature dimension of the sample vector;
step2_4_2, calculating a fuzzy classification matrix U through a fuzzy C-means clustering algorithm, wherein a solving formula of U is defined as:
Figure FDA0003089609000000013
wherein u _ ij represents the membership degree of the sample i to the class j, m is a fuzzy coefficient, and c is the class number;
step2_4_3, initializing the parameter μ of the likelihood C-means clustering,
mu is taken as
Figure FDA0003089609000000014
Step2_4_4, the objective function and constraint conditions for the likelihood C-means clustering are defined as:
Figure FDA0003089609000000015
Figure FDA0003089609000000016
maxi{u_ij}>0;
step2_4_5, wherein the membership function and the category center calculation formula are as follows:
u_ij=(1+(d_ij2/μi1/(m-1))-1
Figure FDA0003089609000000021
step2_4_6, setting a threshold value epsilon and a fuzzy coefficient m, and stopping iteration and outputting an optimal fuzzy classification matrix U and a class center matrix C by the algorithm when | | delta C _ i | < epsilon is met;
step2_4_7, ranking d _ j according to the difference value of Euclidean distances between the category vector in the central matrix C and the emotion standard vector emotion _ normal;
step2_5, performing emotion level sorting according to the numerical values of d _ j, wherein the minimum numerical value is the fifth level and is set as a positive emotion level, and the maximum numerical value is the first level and is set as a negative emotion level; obtaining the membership degree of each news and comment belonging to the emotion grade through the optimal fuzzy classification matrix, and calculating the emotion grade through the numerical value of the membership degree;
finishing the description of the emotion grading module;
the user characteristic matrix module comprises:
step3_1, determining the radius r of a sliding window, taking a randomly selected NEWs tag vector NEW _ L1 as a circle with the center vector radius r as the sliding window, sequentially calculating the Euclidean distance L between the rest NEWs tag vectors and the center point NEW _ L1, calculating the NEWs tag vector with the distance from NEW _ L1 being less than or equal to r to mark a set M, and defining that the point belongs to a cluster c 1;
step3_2, calculating the offset vector N from the center vector NEW _ L1 to all elements in the set in turniObtaining an offset vector N ═ N1+N2+…+NN
Step3_3, the center vector NEW _ Li moves by a distance of | N | in the density increasing direction;
step3_4, repeating the above operations until the value of the offset is smaller than the threshold value K, and marking the obtained central vector NEW _ Li;
continuing to repeat the operation until all points are classified;
step3_5, marking the frequency of each news label vector according to each class, and selecting the class with the maximum frequency as the attribution class kid of the news label vector;
step3_6, taking the latest browsing time of the user as a time stamp, calculating the difference between the browsing time of the rest news and the time stamp, dividing the difference by the time precision to normalize, and obtaining the time weight corresponding to each news; setting a timestamp as t, browsing news i at ti, and setting a time weight Pti as 1/(1+ ((t-ti)/m)), wherein m is time precision;
step3_7, assigning a weight Wi to news i browsed by the user, wherein Wi is Pti and emotoii, Pti is a time weight, and emotoii is an emotion level of the comment of the user;
step3_8, accumulating and summing the interest degrees of the users according to the news characteristic word types, obtaining the total weight of the interest of the users to each news type, setting the news weight Wi as Pti emootini, and quantizing the user interest numeralization into:
User_label_1=(W1*NEW_S1_k1+W2*NEW_S2_k1+…+Wn*NEW_Sn_k1),…,
User_label_n=(W1*NEW_S1_kn+W2*NEW_S2_kn+…+Wn*NEW_Sn_kn),
the User _ label _ n represents the interest degree of the User on the nth type NEWs, Wn represents the weight of the nth NEWs, and NEW _ Sn _ kn represents the similarity of the NEWs to the nth type NEWs;
step3_9, the interest weighting of the User for n types of news is represented as an n-dimensional vector User _ label:
user _ label ═ User _ label _1, User _ label _2, …, User _ label _ n ], where News _ label _ n represents the similarity of the News label belonging to the nth interest category to the category-centered vector, and if there is no News label belonging to the nth interest category, the nth dimension value is set to 0;
step3_10, merging the User interest vector User _ label and the User characteristic information, filling the merged User interest vector User _ label into a User vector, storing the User vector into a User matrix, and finishing the construction of the User characteristic matrix;
completing the process description of the user characteristic matrix construction module;
the user emotion prediction module comprises:
step4_1, reading the user behavior data, converting the time data, and creating a time sequence; the user behavior time is used as an index, the user emotion information is used as content,
sequence=((time_1,emotion_1),(time_2,emotion_2),…,(time_n,emotion_n));
step4_2, supplementing blank items of the time sequence, and calculating the emotion bias Ud of the user, wherein the Ud is (emotion _1+ emotion _2+ … + emotion _ n)/m, wherein m is the number of news with emotion comments of the user, and the blank items in the sequence are supplemented by the Ud;
step4_3, judging the stationarity of the created time series by unit root test, if no unit root exists, the created time series is a stationary series, and the result is that the value p is greater than the significance level (0.05), wherein p is the number of sequence autoregressive terms; otherwise, carrying out differential operation on the created time sequence to generate a new time sequence;
step4_4, performing first-order difference operation on the time sequence in the difference operation, repeatedly performing stationarity check, and proving that the time sequence subjected to the first-order difference operation is a stable sequence through the stationarity check, wherein the first-order difference operation process is as follows: sequence1 ═ time _1, animation _1), (time _2, animation _ 2-animation _1), …, (time _ n, animation _ n-1));
step4_5, performing white noise test on the time sequence, judging a P value through LB statistic test, if P is less than 0.05, proving that the time sequence is a non-white noise sequence, otherwise, performing high-order differential operation;
step4_6, initializing m, p, q, a, b values, and expressing the emotion level of the user in the next time period as follows: the method comprises the following steps that (1) emotion _ n +1, m + a emotion _1+, + a emotion _ n +1-p + b emotion _1+, + b emotion _ n +1-q, wherein m is a constant, q is the number of moving average terms, and a and b are autoregressive weight and moving average weight respectively;
step4_7, performing parameter optimization by a minimum bic criterion method, wherein bic is ln (n) (k-2 ln (L)), k is the number of parameters, n is the number of samples, and L is a likelihood function, establishing a final time series model by implanting an optimal parameter calculation result, and predicting the emotion level of the user in the next time period by the model;
the user emotion prediction process is described;
the news push module includes:
step5_1, selecting news vectors which are not browsed by the user from the news matrix according to the user id, and storing the news vectors into a to-be-processed list, wherein the news vectors are represented as: waitlist ═ (NEWS _1, NEWS _2, …, NEWS _ N);
step5_2, calculating the content similarity sim _ g (U, N) between the news vector and the user vector in waitlist:
sim_g(U,N)=[(U1-N1)^2+(U2-N2)^2+…+(Uq-Nq)^2]^1/2;
step5_3, normalizing the content similarity sim _ g (U, N) between the user vector and the news vector: sim _ g (U, N) ═ 1/(1+ sim _ g (U, N));
step5_4, sorting the calculated sim _ g (U, N) according to the sequence from big to small and storing the corresponding news id into a to-be-selected news recommendation list according to the sequence;
step5_5, in an intelligent news pushing system based on emotion protection, when news is pushed dynamically, the proportion of each level of news in a news recommendation list is set through a news proportion division algorithm according to predicted user emotion states, and the news recommendation list is generated and pushed to a user;
the concrete description is as follows:
step5_5_1, dividing the news sequence of each user according to T time intervals, wherein the divided N _ i represents the news sequence N _ i corresponding to the user i (T _1, T _2, …, T _ T), wherein T _ T is (News _ e1+ News _ e2+ … + News _ en)/n, News _ en represents the emotion grade value corresponding to the News browsed by the user, T _ T represents the emotion grade average value of the News in the time period T, the comment sequence of each user is divided according to the time interval T, the divided U _ ie represents the comment sequence corresponding to the user i, U _ ie is (T _ e1, T _ e2, …, T _ et), T _ et is (observation _ te1+ observation _ te2+ … + observation _ ten)/n, wherein T _ et represents the emotion grade average value of the user i in the time period T, and emotion _ ten represents the emotion grade value of the comment of the user i;
step5_5_2, initializing a user emotion level probability P (U _ ei) ═ E _ im/E _ N and a news emotion level probability P (N _ ei) ═ N _ im/N _ N, wherein E _ im represents the number of times of ith level emotion in the whole user comment sequence, E _ N represents the number of times of whole emotion level in the whole user comment sequence, N _ im represents the number of times of ith level emotion news in the whole user news sequence, and N _ N represents the number of times of whole emotion level news in the whole user news sequence;
step5_5_3, initializing the probability P (N _ ej | U _ ei) that the browsing news is in j level under the condition that the emotion of the known user is in i level as N _ j/U _ i, wherein U _ i represents the total number that the emotion of the user in all user sequences is in i level, and N _ j is the total number that the emotion of the news in all user sequences is in j level;
step5_5_4, calculating the emotion probability P (U _ ei) of the user i as E _ i _ m/E _ i _ N, the news emotion level probability P (N _ ei) of the user i as N _ i _ m/N _ i _ N, setting P (U _ ei) as P (U _ E) if P (U _ ei) is 0, and setting P (N _ ei) as P (N _ E) if P (N _ ei) is 0;
step5_5_5, calculating the probability that the browsed news is in the j-th level when the emotion of the user k is known to be in the i level (N _ ej _ k | U _ ei _ k); selecting all news sequences when the emotion of a user k is in an i level, setting the emotion value of the news emotion in a j level to be 1 in the news sequences, setting the rest of the emotion values to be 0, wherein the sample distribution at the moment meets Bernoulli distribution, and the function of probability distribution is as follows:
P(N_ej_k|U_ei_k)NEW_emotion*(1-P(N_ej_k|U_ei_k))1-NEW_emotion
thus, the likelihood function is defined as:
L(P)=P(N_ej_k|U_ei_k)NEW_emotion_1*(1-P(N_ej_k|U_ei_k))1-NEW_emotion_1*
P(N_ej_k|U_ei_k)NEW_emotion_2*(1-P(N_ej_k|U_ei_k))1-NEW_emotion_2*..*
P(N_ej_k|U_ei_k)NEW_emotion_n*(1-P(N_ej_k|U_ei_k))1-NEW_emotion_n
the maximum likelihood estimate of P (N _ ej | U _ ei) is obtained by calculation as:
P(N_ej_k|U_ei_k)=(NEW_emotion_1+NEW_emotion_2+..+NEW_emotion_n)/n,
if P (N _ ej _ k | U _ ei _ k) ═ 0, then P (N _ ej _ k | U _ ei _ k) ═ P (N _ ej | U _ ei);
step5_5_6, calculating P (U _ ei _ k | N _ ej _ k) by a Bayesian method, wherein the probability that the emotion level of the user is i when the user k browses news with the emotion level of j, and the calculation formula of P (U _ ei _ k | N _ ej _ k) are as follows:
P(U_ei_k|N_ej_k)=P(N_ej_k|U_ei_k)*P(U_ei)/P(N_ej);
step5_5_7, selecting the emotion T _ et of the user in the next time period predicted by the time sequence model, wherein the proportion of each emotion news in the next phase of dynamic pushing is as follows:
P(T_et+1|N_e1):P(T_et+1|N_e2):P(T_et+1|N_e3):P(T_et+1|N_e4):P(T_et+1|N_e5),
wherein T _ et +1 represents an emotional state one step higher than the predicted user emotion T _ et, and if T _ et is equal to 5, T _ et +1 ═ T _ et allows the user to maintain the most positive emotion;
step5_5_8, when the user k generates a new behavior sequence T _ T +1, the iterative update process is described as:
firstly, jumping to Step _5, and calculating P (N _ ej _ k | U _ ei _ k) after adding a new behavior sequence of a user k;
secondly, calculating P (U _ ei1), wherein P (U _ ei1) is the probability that the emotion of the user comment in the new behavior sequence of the user is in the i level;
finally, according to a formula P (U _ ei _ k | N _ ej _ k) ═ P (N _ ej _ k | U _ ei _ k) × P (U _ ei _ k | N _ ej _ k)/P (U _ ei1), the value of P (U _ ei _ k | N _ ej _ k) is updated, and then the Step is skipped to Step _7 for recommendation;
step5_6, obtaining a division ratio according to the news ratio division algorithm, and sequentially selecting N news from the news recommendation list to be selected to fill in the news recommendation list, wherein the number of the news of each emotion level is Ni, and Ni is N P (T _ et +1| N _ ei)/((P (T _ et +1| N _ e1) + P (T _ et +1| N _ e2) + P (T _ et +1| N _ e3) + P (T _ et +1| N _ e4) + P (T _ et +1| N _ e 5));
and finishing the description of the news dynamic pushing process.
CN202110606444.XA 2021-05-28 2021-05-28 Intelligent news recommendation system based on emotion protection Pending CN113343120A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110606444.XA CN113343120A (en) 2021-05-28 2021-05-28 Intelligent news recommendation system based on emotion protection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110606444.XA CN113343120A (en) 2021-05-28 2021-05-28 Intelligent news recommendation system based on emotion protection

Publications (1)

Publication Number Publication Date
CN113343120A true CN113343120A (en) 2021-09-03

Family

ID=77473821

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110606444.XA Pending CN113343120A (en) 2021-05-28 2021-05-28 Intelligent news recommendation system based on emotion protection

Country Status (1)

Country Link
CN (1) CN113343120A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114564675A (en) * 2022-04-28 2022-05-31 深圳格隆汇信息科技有限公司 Information recommendation method and device and storage medium
CN116259110A (en) * 2023-05-09 2023-06-13 杭州木兰科技有限公司 Security detection method, device, equipment and storage medium for ATM protection cabin
CN117150145A (en) * 2023-10-31 2023-12-01 成都企软数字科技有限公司 Personalized news recommendation method and system based on large language model
CN117251643A (en) * 2023-08-29 2023-12-19 海南大学 News recommendation method and system based on collaborative filtering and probabilistic language term set

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114564675A (en) * 2022-04-28 2022-05-31 深圳格隆汇信息科技有限公司 Information recommendation method and device and storage medium
CN114564675B (en) * 2022-04-28 2022-07-22 深圳格隆汇信息科技有限公司 Information recommendation method and device and storage medium
CN116259110A (en) * 2023-05-09 2023-06-13 杭州木兰科技有限公司 Security detection method, device, equipment and storage medium for ATM protection cabin
CN116259110B (en) * 2023-05-09 2023-08-08 杭州木兰科技有限公司 Security detection method, device, equipment and storage medium for ATM protection cabin
CN117251643A (en) * 2023-08-29 2023-12-19 海南大学 News recommendation method and system based on collaborative filtering and probabilistic language term set
CN117251643B (en) * 2023-08-29 2024-05-07 海南大学 News recommendation method and system based on collaborative filtering and probabilistic language term set
CN117150145A (en) * 2023-10-31 2023-12-01 成都企软数字科技有限公司 Personalized news recommendation method and system based on large language model
CN117150145B (en) * 2023-10-31 2024-01-02 成都企软数字科技有限公司 Personalized news recommendation method and system based on large language model

Similar Documents

Publication Publication Date Title
Tripathy et al. Classification of sentiment reviews using n-gram machine learning approach
CN113343120A (en) Intelligent news recommendation system based on emotion protection
US8645298B2 (en) Topic models
Amara et al. Collaborating personalized recommender system and content-based recommender system using TextCorpus
Ko et al. Text classification from unlabeled documents with bootstrapping and feature projection techniques
Lubis et al. The effect of the TF-IDF algorithm in times series in forecasting word on social media
CN107688870B (en) Text stream input-based hierarchical factor visualization analysis method and device for deep neural network
CN111241425B (en) POI recommendation method based on hierarchical attention mechanism
Abbasi et al. A grouping hotel recommender system based on deep learning and sentiment analysis
CN111723256A (en) Government affair user portrait construction method and system based on information resource library
CN113326432A (en) Model optimization method based on decision tree and recommendation method
Trupthi et al. Possibilistic fuzzy C-means topic modelling for twitter sentiment analysis
Lee et al. Sentiment analysis on online social network using probability Model
CN113449508A (en) Internet public opinion correlation deduction prediction analysis method based on event chain
CN113407729A (en) Judicial-oriented personalized case recommendation method and system
CN108614860A (en) A kind of lawyer&#39;s information processing method and system
CN111859955A (en) Public opinion data analysis model based on deep learning
TW201243627A (en) Multi-label text categorization based on fuzzy similarity and k nearest neighbors
CN117235253A (en) Truck user implicit demand mining method based on natural language processing technology
CN115905695A (en) Personalized literature recommendation method combining Doc2vec and Faiss
Voronov et al. Forecasting popularity of news article by title analyzing with BN-LSTM network
Alorini et al. Machine learning enabled sentiment index estimation using social media big data
CN113688633A (en) Outline determination method and device
CN113988152A (en) User type prediction model training method, resource allocation method, medium, and apparatus
Kazim et al. Preprocessing of Drugs Reviews and Classification Techniques

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination