CN105183833B

CN105183833B - Microblog text recommendation method and device based on user model

Info

Publication number: CN105183833B
Application number: CN201510548344.0A
Authority: CN
Inventors: 喻梅; 徐天一; 王建荣; 于健; 缑小路; 郭佳
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2015-08-31
Filing date: 2015-08-31
Publication date: 2020-05-19
Anticipated expiration: 2035-08-31
Also published as: CN105183833A

Abstract

The invention discloses a microblog text recommendation method and a microblog text recommendation device based on a user model, wherein the method comprises the following steps: acquiring microblog data, forming a microblog document, and preprocessing the microblog document; establishing a target user topic model according to the LDA topic model, and calculating the matching degree of the candidate microblog and the target user topic model; establishing a target user keyword vector model based on a TF-IDF algorithm, and calculating the matching degree of the candidate microblog and the target user keyword vector model; and calculating the matching degree of the candidate microblog and the target user model as the score of the candidate microblog by combining the two matching degrees by using a weighted average method, and sequencing the score. The device comprises: according to the invention, the microblog information which is possibly interested by the target user can be found and recommended to the target user, so that the contact among the users is strengthened to facilitate the promotion of the vitality of the microblog.

Description

Microblog text recommendation method and device based on user model

Technical Field

The invention relates to the fields of data mining, natural language processing and information retrieval, in particular to a microblog text Recommendation Method (MCRA) based on a user model and a Recommendation device thereof.

Background

At present, various methods for personalized recommendation of microblog user modeling are available, and the methods can be roughly summarized into two types from the viewpoint: and the microblog user relationship or the microblog user releases the text content. Analyzing the microblog user relationship, and carrying out personalized recommendation: the method comprises the steps of analyzing the relation of a microblog user in a social network, analyzing the position of the microblog user in a community, analyzing the influence of the microblog user in the community, and ranking the influence to recommend the microblog user. Analyzing text contents issued by microblog users: and processing and analyzing the microblog content issued by the microblog user, so as to model and recommend the microblog user individually. And recommending the user or the content with the highest similarity to the user by judging the similarity between other users and the model. At the heart of this solution is the user content modeling.

The conventional statistical method Term Frequency-Inverse text Frequency model (TF-IDF) and topic modeling are commonly used for the user content modeling method. However, the traditional content modeling method TF-IDF cannot reflect the interest of the user on the potential subject.

The topic modeling technology mainly includes a Latent Semantic model (LSA), a probabilistic Latent Semantic model (PLSA), an implicit Dirichlet Allocation model (LDA), and the like. The LSA model maps documents from a sparse high-dimensional word space to a low-dimensional vector space, using the low-dimensional space to depict synonyms that correspond to the same or similar topics. However, the LSA model does not depict a probabilistic model of the number of occurrences of terms; the PLSA model is similar to the LSA model in the idea, introducing probability expressions between classes (topics) and words, and the parameters of the model can be obtained using the Expectation Maximization Algorithm (EM) and maximum likelihood estimation. This model does not provide a suitable probabilistic model at the document level, so that the PLSA model is not a perfect generative model, but rather the model must be randomly sampled in case of a determined document.

In response to the deficiency of PLSA, researchers have proposed the LDA model of cryptodirichlet distribution. The LDA model introduces two probability distributions, namely document theme probability distribution and theme term probability distribution, and the document is considered to be composed of multiple themes in a certain probability form, and the theme is considered to be composed of terms in a certain probability form, which accords with the generation process of the document. The LDA topic model can well reflect topics concerned by users, but the method cannot avoid inaccurate modeling caused by limitation of the number of microblog characters. The best recommendation effect cannot be achieved by only using the user theme model in recommendation.

Disclosure of Invention

The invention provides a user model-based microblog text recommendation method and a user model-based microblog text recommendation device, which can find microblog information which is possibly interested by an experimental target user in massive microblog information issued by other microblog users and recommend the microblog information to the target user, so that the relation among users is strengthened to improve the vitality of microblogs, and the following description is provided:

a microblog text recommendation method based on a user model comprises the following steps:

acquiring microblog data, forming a microblog document, and preprocessing the microblog document;

establishing a target user topic model according to the LDA topic model, and calculating the matching degree of the candidate microblog and the target user topic model;

establishing a target user keyword vector model based on a TF-IDF algorithm, and calculating the matching degree of the candidate microblog and the target user keyword vector model;

and calculating the matching degree of the candidate microblog and the target user model as the score of the candidate microblog by combining the two matching degrees by using a weighted average method, and sequencing the score.

The step of calculating the matching degree of the candidate microblog and the target user model as the score of the candidate microblog and ranking the score specifically comprises the following steps:

after the Score (W, u) of the candidate microblogs is obtained, the candidate microblogs are ranked according to the Score, and an initial microblog recommendation list L of the target user is constructed₀For the initial microblog recommendation list L₀Carrying out redundancy processing;

and outputting the recommendation list after the redundancy processing.

A user model-based microblog text recommendation device, the device comprising:

the acquisition and preprocessing module is used for acquiring microblog data, forming a microblog document and preprocessing the microblog document;

the first calculation module is used for establishing a target user topic model according to the LDA topic model and calculating the matching degree of the candidate microblog and the target user topic model;

the second calculation module is used for establishing a target user keyword vector model based on a TF-IDF algorithm and calculating the matching degree of the candidate microblog and the target user keyword vector model;

and the ranking module is used for calculating the matching degree of the candidate microblog and the target user model as the score of the candidate microblog by combining the two matching degrees by using a weighted average method, and ranking the score.

Wherein the sorting module further comprises:

a redundancy processing submodule, configured to, after obtaining scores Score (W, u) of the candidate microblogs, rank the candidate microblogs according to the scores, and construct an initial microblog recommendation list L of the target user₀For the initial microblog recommendation list L₀Carrying out redundancy processing;

and the output submodule is used for outputting the recommendation list after the redundancy processing.

The technical scheme provided by the invention has the beneficial effects that:

(1) in short text recommendation, a target user model is established for a target user by combining an LDA topic model method and a TF-IDF modeling method, so that the advantages of the two methods are effectively exerted, a more accurate user modeling effect is obtained, and a calculation method for calculating the matching degree of a candidate microblog and the user model is provided.

(2) According to the characteristics of microblog texts, a candidate microblog scoring standard based on weighting is provided, and the proportion of the modeling method in scoring can be effectively controlled by adjusting the weight. And scoring the candidate microblogs and carrying out TOP-N recommendation so as to obtain a more accurate microblog text recommendation algorithm.

Drawings

FIG. 1 is a flow chart of a microblog text recommendation method based on a user model;

FIG. 2 is a flow chart of the MCRA algorithm;

fig. 3 is a graph showing changes in AP when α is 0.0001 and β is different;

FIG. 4 is a graph showing a comparison of F values for MCRA, LDA and TF-IDF;

FIG. 5 is a diagram illustrating a comparison of AP values for MCRA and TF-IDF algorithms;

FIG. 6 is a schematic diagram of a microblog text recommendation device based on a user model;

FIG. 7 is a schematic diagram of a sorting module.

In the drawings, the components represented by the respective reference numerals are listed below:

1: an acquisition and preprocessing module; 2: a first calculation module;

3: a second calculation module; 4: a sorting module;

41: a redundancy processing submodule; 42; and outputting the submodule.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below.

Example 1

A microblog text recommendation method based on a user model is disclosed, and referring to FIG. 1, the microblog text recommendation method comprises the following steps:

101: acquiring microblog data, forming a microblog document, and preprocessing the microblog document;

for example: according to the method and the device, the Sina microblog is taken as a research object, a certain Sina microblog user is selected as a target user, and content recommendation is carried out on the Sina microblog user. The microblog content released and forwarded by the target user and the attendees thereof is used as a research scope of the embodiment of the invention, and the microblog content released and forwarded by the target user and the attendees thereof is assumed to be the favorite content of the target user and can be used as research content to analyze the interest and hobbies of the target user. And capturing microblog data issued and forwarded by the target user and the attendees thereof, and forming a microblog document subjected to model construction according to the embodiment of the invention.

Preprocessing each microblog document, including: and (3) performing word segmentation, vectorization, dimension reduction and the like, and selecting a training set and a test set (a set consisting of candidate microblogs) for experiments. The specific operation of this step is well known to those skilled in the art, and the detailed description thereof is omitted here.

102: establishing a target user topic model according to the LDA topic model, and calculating the matching degree of the candidate microblog and the target user topic model;

103: establishing a target user keyword vector model based on a TF-IDF algorithm, and calculating the matching degree of the candidate microblog and the target user keyword vector model;

the target user model includes: a target user topic model and a target user keyword vector model. And when the matching degree of the candidate microblog and the target user model is calculated, the matching degree of the candidate microblog, the target user topic model and the target user keyword vector model is calculated respectively.

104: and calculating the matching degree of the candidate microblog and the target user model as the score of the candidate microblog by combining the two matching degrees by using a weighted average method, and sequencing the score.

In specific implementation, the embodiment of the invention performs topic modeling on the target user according to the message content issued by the target user. And acquiring a microblog list to be recommended, scoring the candidate microblogs according to the topic matching degree of the candidate microblogs and the target user, and sequencing the candidate microblogs according to the scores so as to recommend the candidate microblogs.

In summary, in the embodiments of the present invention, through the steps 101 to 104, the accuracy of microblog text recommendation is improved, so that the microblog really interested by the target user is arranged at a position further forward in the recommendation list.

Example 2

The scheme in embodiment 1 is described in detail below with reference to specific calculation formulas, examples, and fig. 2, where the MCRA Algorithm is divided into two sub-algorithms, namely, Target User Modeling Algorithm (TUMA) and text Recommendation Algorithm (CRA), and is described in detail below:

201: acquiring experimental data;

namely, the contents of microblog texts released and forwarded by a target user and a follower thereof are captured to construct an experimental microblog document. When experimental data are captured, a crawler program is designed by utilizing a Sina microblog open Application Programming Interface (API), target users and users related to the target users are selected, and microblog documents of the users are correspondingly formed. In a specific implementation, other software may be used to capture the experimental data, which is not limited in this embodiment of the present invention.

202: preprocessing data;

firstly, a Hidden Markov chain (HMM) -based Chinese Lexical Analysis System (Institute of Computing Technology, Chinese Lexical Analysis System, ICTCLASs) is applied to perform word segmentation processing on all documents, then terms of noun attributes in each microblog are extracted to represent the microblog, and a Vector Space Model (VSM) concept is adopted to perform dimension reduction on the microblog. And finally, selecting the microblog texts of the users which contain the target user and are related to the target user as a modeling data set from the processed microblog texts. In a specific implementation, other word segmentation processing software may also be used, which is not limited in this embodiment of the present invention.

203: in the modeling data set, training the constructed microblog text vector set by using an LDA (latent dirichlet allocation) model;

according to the characteristics of microblog users and microblog texts, all microblog contents issued by each user are regarded as a document, microblog documents of a plurality of users are trained, and an LDA model is solved by using a Gibbs sampling method (Griffiths T L, Steyvers M. filing scientific topics, proceedings of the National academy of science of the United States of America,2004,101(Suppl 1): 5228-5235), wherein the algorithm is described as follows:

first, some special symbols used in the LDA model are explained as follows:

d: document set, D ═ D₁,d₂,…,d_i}，d_iIs the ith document.

T: set of topics, T ═ T₁,t₂,…,t_i}，t_iIs the ith topic.

W: set of terms, W ═ W₁,w₂,…,w_i}，w_iIs the ith term.

V：V＝{v₁,v₂,…,v_iSet of terms, set of all non-repeating terms in the corpus, v_iRepresenting the ith term.

u: a target user.

(1) For inputting to a userDocument set D ═ D₁,d₂,…,d_iand setting initial values of Dirichlet distribution parameters, namely a parameter β reflecting the relevance degree of the text and the theme and a parameter beta reflecting the density of the theme and the terms, wherein the iteration number is N_iteratio；

(2) Randomly selecting a theme for all terms in the document set D, and calculating an iteration initial value { the number of occurrences of k themes in m documents

m sum of number of topics in document n_mNumber of times of word t corresponding to k topic

Total number of words n corresponding to k topics_kI.e. that

(3) In document set D, for any word t in document m, the word t belongs to topic k. For t sampling a new topic, the values in (2) are

(4) Repeating step (3) until the Markov chain converges to the maximum likelihood probability;

(5) and outputting the document-theme probability distribution theta and the theme-term probability distribution phi.

After the LDA model is solved by the Gibbs sampling method, the user-theme probability relation P (T | u) of the target user can be obtained from the document-theme probability distribution theta output in the LDA model solving.

In addition, the output topic-term probability distribution Φ includes the probability of each term generated by the current topic corresponding to each topic, which is a very large matrix, the probabilities of the terms are ranked, and it is noted that the probability of terms generated from topics ranked next to each other is very small, and from the perspective of saving computing time, the embodiment of the present invention takes the top 20 terms in each topic to construct the topic term probability relationship matrix P (V | T) of the corpus. The embodiment of the invention does not limit the selected number.

204: the target user microblog documents and the user microblog documents which have concern with the target user jointly form a training set for training a user keyword vector model;

205: after a training set range is selected, processing the content issued by each user in the training set range into microblog documents, calculating the lexical item weight of each user document by using a TF-IDF algorithm, sequencing the lexical item weights in the microblog documents of target users, and acquiring a user keyword vector model K (u);

206: inputting a user-topic probability relation P (T | u) of a target user, a target user keyword vector model K (u) and a candidate microblog document D;

207: for each microblog W in the microblog document D, calculating the interest probability P (W | u) of the user u in the candidate microblog W according to a formula (1), and calculating K according to a formula (2)_u(W)；

P(W|u)＝max{P(w₁|u),P(w₂|u),…P(w_i|u),…,P(w_m|u)} (1)

Wherein W ═ { W ═ W₁,w₂,…,w_m}，P(w_i| u) (i ═ 1,2, …, m) is the term W in the candidate microblog W for the user u_iThe probability of interest.

When the lexical items in the candidate microblog W are matched with the lexical items issued by the target user, defining the matching degree of the keyword vector model of the candidate microblog W and the target user u as each lexical item W in the candidate microblog W_iAnd (4) the weight value in the keyword vector model K (u) of the target user u is the maximum value, otherwise, the weight of the candidate microblog is the lexical item weight with the lowest weight value in the keyword vector model of the target user.

Wherein, K_u(w_n) For user u, the term w in the candidate microblog is paired_nScoring of (4); w (v)_iU) as target user u to term v in own document_iIs scored；K_u(W) is the score of user u on the candidate microblogs; v_targetusrIs a collection of terms that are published by a target user.

208: calculating Score (W, u) according to formula (3);

and (3) carrying out weighted average on the matching degree of the target user topic model and the candidate microblog and the matching degree of the target user keyword vector model and the candidate microblog according to a formula (3). The lambda and the mu are respectively used for adjusting the matching degree weight of the user topic model and the candidate microblog and the matching degree weight of the target user keyword vector model and the candidate microblog.

209: after the Score (W, u) of the candidate microblogs is obtained, the candidate microblogs are ranked according to the Score, and an initial microblog recommendation list L of the target user is constructed₀For the initial microblog recommendation list L₀Carrying out redundancy processing;

in the initial microblog recommendation list, the same microblog forwarded by some people exists, in the recommendation process, the redundant removal processing needs to be carried out on the microblog, and after the redundant removal processing, personalized recommendation (TOP-N recommendation) is provided for a user for the recommendation list.

210: and outputting the recommendation list L after the redundancy processing.

The content realized by the embodiment of the invention comprises the steps of selecting the target user and carrying out theme modeling on the target user according to the message content issued by the target user. And scoring the candidate microblogs according to the topic matching degree of the candidate microblogs and the target user, and sorting the candidate microblogs according to the scores so as to recommend the candidate microblogs.

The embodiment of the invention firstly provides a user theme modeling method combining LDA and TF-IDF to obtain better modeling effect, and secondly provides a new similarity calculation method when a target user keyword vector model is obtained; finally, an improvement method of the topic matching degree of the candidate microblog and the target user and the matching degree of the keyword vector model of the candidate microblog and the target user is provided.

By adopting the MCRA algorithm, the display type behaviors of the user in the microblog can be accurately applied to carry out theme modeling on the user, and the interest of the target user can be accurately analyzed. The MCRA algorithm comprehensively adopts the idea of the LDA algorithm and the idea of the TF-IDF algorithm to establish a model of the target user, so that the correct result in the recommendation list can obtain a more forward position under the condition of not influencing indexes such as accuracy, recall rate and the like of the original algorithm, namely, a microblog really interested by the target user can obtain a more forward position in the recommendation list.

Example 3

After the algorithm design is realized, an evaluation method of the algorithm is designed to measure the performance of the algorithm. And (3) designing an evaluation method by taking accuracy (Precision), Recall (Recall), F value and Average Accuracy (AP) as evaluation standards, evaluating the effectiveness, correctness and the like of the designed algorithm, and analyzing the experimental result.

The number of experimental subjects is set to 150, and TOP-N recommendation is performed under the condition that the values of the number N of recommended microblogs are changed to 10, 20, 30, 40, 50, 60, 70 and 80 respectively. Meanwhile, in order to check the effect of the MCRA algorithm, the method for recommending the user modeling based on the TF-IDF commonly adopted at present by the user modeling based on LDA model recommendation and John Hannon and the like is used as a comparison algorithm, and the accuracy, the recall rate, the F value and the average accuracy are used as evaluation indexes to compare and evaluate the three algorithms.

(1) The calculation formula of accuracy Precision is shown in formula (4).

Wherein L is_all＝{W₀,W₁,…,W_i,…,W_N}，W_iRespectively represent different microblogs, L_nightThe contents are contents which accord with the interest of the target user in the recommendation list and are released for the target user in the experiment.

(2) The Recall ratio Recall calculation formula is shown in formula (5).

Wherein L is_targetusrIs a microblog issued by a target user.

(3) The F value calculation formula is shown in formula (6).

(4) The average accuracy rate AP is an index showing the system's performance in ranking relevant documents. The more top the relevant documents are ranked in the results retrieved by the system, the higher the AP value. If the number of the relevant documents returned by the system is 0, the accuracy is also 0, and the calculation formula is shown as formula (7).

Wherein N is the total number of microblogs issued by the target user, namely related microblogs r_iIs the i-th relevant document searched out, R_iIs the ranking of the ith relevant microblog in the recommendation list.

On the average accuracy index, the MCRA algorithm has a higher rising trend. In 8 groups of experiments, the average accuracy of the MCRA algorithm of 8 groups of experiments is not lower than that of the TF-IDF-based microblog text recommendation algorithm. But in general, the average accuracy of the MCRA algorithm is close to that of the TF-IDF-based microblog text recommendation algorithm, and the difference is not more than 3.1%.

And performing line drawing on F value results of the MCRA algorithm, the TF-IDF-based microblog text recommendation algorithm and the LDA-based microblog text recommendation algorithm in the experiment results of different recommended microblog numbers, wherein the MCRA algorithm and the TF-IDF-based microblog text recommendation algorithm are higher than the LDA-modeling-based microblog text recommendation algorithm in F value indexes as shown in figure 3. The MCRA algorithm and the TF-IDF-based microblog text recommendation algorithm provided by the invention have similar effects on the F value. Therefore, the MCRA and the TF-IDF-based microblog text recommendation algorithm can obtain better effect on the F value index than the LDA-based microblog text recommendation algorithm.

Because the MCRA algorithm and the TF-IDF-based microblog text recommendation algorithm are far better than the LDA-based microblog text recommendation algorithm in the average accuracy index, the average accuracy experiment results of the MCRA algorithm and the TF-IDF-based microblog text recommendation algorithm are plotted in a bar chart, as shown in FIG. 4. The average accuracy index of the algorithm provided by the invention can exceed that of a microblog text recommendation algorithm based on TF-IDF modeling only.

in the experiment, the number of themes is set to be 20, the number of recommended microblogs is set to be 20, α is set to be 0.0001, the value of β is 0.4-1.9, and the experiment result is shown in fig. 5.

in FIG. 5, experimental results show that the recommendation system designed by the embodiment of the invention can achieve the best recommendation effect when α is 0.0001 and β is 0.5. the analysis is as follows, when β is lower than 0.4, the matching degree P (W | u) of the candidate microblog and the target user topic model takes a larger weight in the score, and when β is greater than 1.9, the matching degree K (u) of the candidate microblog and the user keyword vector takes a larger weight in the score.

therefore, in the experiment, the MCRA algorithm is set to have the parameters α ═ 0.0001, β ═ 0.5 is most reasonable, the subject number is 150, and the accuracy of the three algorithms with different recommended numbers is compared as shown in table 1.

TABLE 1 comparison of accuracy rates for three algorithms with 150 subject numbers and different recommended numbers

As can be seen from table 1, in 8 groups of experiments, the accuracy of the MCRA algorithm of the 8 groups of experiments is higher than that of the LDA-based microblog text recommendation algorithm, which is 6% higher at the lowest and 24% higher at the highest. The accuracy of the 5 groups of experimental MCRA algorithms is not lower than that of a TF-IDF-based microblog text recommendation algorithm. The number of subjects was 150, and the recall ratios of the three algorithms with different recommended numbers are shown in table 2. The number of subjects was 150, and the F-value ratio of the three algorithms for changing the recommended number is shown in table 3. The number of subjects was 150 and the average accuracy comparison of the three algorithms with varying recommended numbers is shown in table 4.

TABLE 2 recall comparison of three algorithms with 150 themes and different recommended numbers

TABLE 3F-value comparison of three algorithms with 150 subject numbers and varying recommendation numbers

As can be seen from Table 3, the F values of the 8 experimental MCRA algorithms are higher than that of the LDA-based microblog text recommendation algorithm by 10% at least and 25.7% at most.

TABLE 4 comparison of average accuracy for three algorithms with 150 subject numbers and varying recommended numbers

As can be seen from table 4, the average accuracy of 8 groups of experimental MCRA algorithms is higher than that of LDA-based microblog text recommendation algorithm, which is at least 6% higher and at most 18% higher, and the number of recommended microblogs increases.

Example 4

A microblog text recommending apparatus based on a user model, referring to fig. 6, the apparatus comprising:

the acquisition and preprocessing module 1 is used for acquiring microblog data, forming a microblog document and preprocessing the microblog document;

the first calculation module 2 is used for establishing a target user topic model according to the LDA topic model and calculating the matching degree of the candidate microblog and the target user topic model;

the second calculation module 3 is used for establishing a target user keyword vector model based on a TF-IDF algorithm and calculating the matching degree of the candidate microblog and the target user keyword vector model;

and the sorting module 4 is used for calculating the matching degree of the candidate microblog and the target user model as the score of the candidate microblog by combining the two matching degrees by using a weighted average method, and sorting the score.

Wherein, referring to fig. 7, the sorting module 4 further includes:

a redundancy processing submodule 41, configured to, after obtaining scores Score (W, u) of the candidate microblogs, rank the candidate microblogs according to the scores, and construct an initial microblog recommendation list L of the target user₀For the initial microblog recommendation list L₀Carrying out redundancy processing;

and the output submodule 42 is used for outputting the recommendation list after the redundancy processing.

In the embodiment of the present invention, except for the specific description of the model of each device, the model of other devices is not limited, as long as the device can perform the above functions.

Those skilled in the art will appreciate that the drawings are only schematic illustrations of preferred embodiments, and the above-described embodiments of the present invention are merely provided for description and do not represent the merits of the embodiments.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A microblog text recommendation method based on a user model is characterized by comprising the following steps:

acquiring microblog data to form a microblog document, performing word segmentation, vectorization and dimension reduction on the microblog document, and selecting microblog texts of users which contain target users and are related to the target users from the processed microblog texts as a modeling data set;

building a module data set, training the constructed microblog text vector set according to an LDA topic model, and acquiring a user-topic probability relation of a target user;

the target user microblog documents and the user microblog documents which have concern with the target user jointly form a training set for training a user keyword vector model;

processing the content issued by each user in the training set range into microblog documents, calculating the lexical item weight of each user document by using a TF-IDF algorithm, sequencing the lexical item weights in the microblog documents of the target user, and acquiring a user keyword vector model;

inputting a user-theme probability relation of a target user, a keyword vector model of the target user and a candidate microblog document; calculating the probability of interest of a user to each candidate microblog and a keyword vector model for each microblog in the candidate microblog document;

calculating scores of the candidate microblogs according to the interesting probability and the keyword vector model, and sequencing the scores;

the steps of calculating scores of the candidate microblogs according to the interesting probability and the keyword vector model and sequencing the scores specifically comprise:

after the scores of the candidate microblogs are obtained, sorting the candidate microblogs according to the scores, constructing an initial microblog recommendation list of a target user, and performing redundancy processing on the initial microblog recommendation list;

and outputting the recommendation list after the redundancy processing.

2. A microblog text recommending device based on a user model is characterized by comprising:

the acquisition and preprocessing module is used for acquiring microblog data, forming a microblog document, and performing word segmentation, vectorization and dimension reduction on the microblog document;

the first calculation module is used for training the constructed microblog text vector set according to the LDA topic model in the modeling data set to obtain a user-topic probability relation of a target user;

the second calculation module is used for forming a training set for training a user keyword vector model by the target user microblog document and the user microblog document which has the concern relation with the target user; processing the content issued by each user in the training set range into microblog documents, calculating the lexical item weight of each user document by using a TF-IDF algorithm, sequencing the lexical item weights in the microblog documents of the target user, and acquiring a user keyword vector model;

the sorting module is used for inputting a user-theme probability relation of a target user, a keyword vector model of the target user and candidate microblog documents; calculating the probability of interest of a user to each candidate microblog and a keyword vector model for each microblog in the candidate microblog document; calculating scores of the candidate microblogs according to the interesting probability and the keyword vector model, and sequencing the scores;

the sorting module further comprises:

the redundancy processing sub-module is used for obtaining scores of the candidate microblogs, sorting the candidate microblogs according to the scores, constructing an initial microblog recommendation list of a target user, and performing redundancy processing on the initial microblog recommendation list;