CN103488705A

CN103488705A - User interest model incremental update method of personalized recommendation system

Info

Publication number: CN103488705A
Application number: CN201310403293.3A
Authority: CN
Inventors: 姚兴苗; 夏春燕; 伍盛; 胡光岷
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2013-09-06
Filing date: 2013-09-06
Publication date: 2014-01-01
Anticipated expiration: 2033-09-06
Also published as: CN103488705B

Abstract

The invention discloses a user interest model incremental update method of a personalized recommendation system. According to the basic principle of the method, the method includes storing and generating an intermediate result of calculation of a current user interest model, and performing incremental calculating on the basis of the intermediate result when the user interest model is updated. On the premise that interest information is protected from losing during updating process, the requirements that the user interest model can be updated rapidly and continuously on condition of large data amount can be met, performances of the recommendation system can be improved, and higher-quality service is provided for users.

Description

Incremental update method of user interest model for personalized recommendation system

技术领域technical field

本发明涉及计算机应用技术领域，特别是一种个性化推荐系统的用户兴趣模型增量更新方法。The invention relates to the field of computer application technology, in particular to a method for incrementally updating a user interest model of a personalized recommendation system.

背景技术Background technique

个性化推荐系统通过建立用户与推荐对象之间的二元关系，利用已有的选择过程或相似性关系挖掘每个用户潜在感兴趣的对象，进而进行个性化推荐（刘建国,周涛,汪秉宏.个性化推荐系统的研究进展[J].自然科学进展,2009,19(1),1-15.）。随着用户需求的多样化，个性化推荐系统应用变得更加广泛，不仅用于电子商务，也用于推荐网页、文档等。对于文案人员和研究学者来说需要经常查阅大量的资料文献。基于文档内容信息的个性化推荐系统通过收集和分析用户阅读过的感兴趣文档内容来了解用户的阅读兴趣并建立用户兴趣模型，通过比较文档内容与用户兴趣模型的匹配度，向用户推荐匹配度高的文档。基于文档内容信息的个性化推荐系统有三个重要的模块：用户兴趣建模模块、推荐对象建模模块、推荐算法模块，该系统模型如图1所示。The personalized recommendation system establishes the binary relationship between the user and the recommended object, uses the existing selection process or similarity relationship to mine potential interested objects for each user, and then performs personalized recommendation (Liu Jianguo, Zhou Tao, Wang Binghong. Personality The research progress of the recommendation system [J]. Natural Science Progress, 2009, 19 (1), 1-15.). With the diversification of user needs, personalized recommendation systems have become more widely used, not only for e-commerce, but also for recommending web pages, documents, etc. For copywriters and researchers, it is necessary to consult a large number of documents frequently. The personalized recommendation system based on document content information collects and analyzes the interested document content that the user has read to understand the user's reading interest and establish a user interest model. By comparing the matching degree between the document content and the user interest model, the matching degree is recommended to the user. high documentation. The personalized recommendation system based on document content information has three important modules: user interest modeling module, recommendation object modeling module, and recommendation algorithm module. The system model is shown in Figure 1.

在基于文档内容信息的推荐系统中，用户兴趣建模模块是其中一个核心的模块，其作用是从用户阅读过的感兴趣的文档中提取用户兴趣模型并根据用户兴趣的变化实现兴趣模型更新。为实现高精度的推荐，用户兴趣模型必须能够准确描述用户的当前兴趣，而兴趣模型的更新必须能够快速跟踪用户兴趣的变化。In the recommendation system based on document content information, the user interest modeling module is one of the core modules, and its role is to extract the user interest model from the interested documents that the user has read and update the interest model according to the change of the user interest. In order to achieve high-precision recommendation, the user interest model must be able to accurately describe the user's current interest, and the update of the interest model must be able to quickly track the change of user interest.

目前用户兴趣模型的更新主要有两种方法，时间窗口法和遗忘函数法，时间窗口法是利用滑动时间窗滤除过时的兴趣，遗忘函数法是利用遗忘函数衰减兴趣的权重（费红晓,戴弋,穆珺等.基于优化时间窗的用户兴趣漂移方法[J].计算机工程,2008,34(16),210-214.）。文献（SHIN H.,CHO S..Neighborhood Property Based Pattern Selection for Support VectorMachines[J].Neural Computation,2007,19(3),816-855.）中采用时间窗口法更新用户兴趣模型，该方法利用滑动时间窗滤除过时的兴趣。文献（KEERTHI S.S.,SHEVADE S.K.,BHATTACHARYYA.,et al.A Fast Iterative Nearest Point Algorithm for Support Vector MachineClassifier Design[J].IEEE Transactions on Neural Networks,2000,11(1),124-136.）中采用遗忘函数法更新用户兴趣模型，该方法利用遗忘函数衰减兴趣的权重。单蓉（单蓉.用户兴趣模型的更新与遗忘机制研究[J].微型电脑应用,2011,27(7),10-11,69）根据HTML文档的特点以及用户的浏览速度更新兴趣模型，结合遗忘因子修正特征词的权重来实现模型的遗忘。文献（李峰,裴军,游之洋.基于隐式反馈的自适应用户兴趣模型[J].计算机工程与应用,2008,44(9),76-79.）将用户兴趣分为短期兴趣和长期兴趣，短期兴趣采用时间窗口更新机制，长期兴趣采用基于时间的遗忘函数的更新策略。At present, there are two main methods for updating the user interest model, the time window method and the forgetting function method. The time window method uses a sliding time window to filter out outdated interests, and the forgetting function method uses the forgetting function to attenuate the weight of interest (Fei Hongxiao, Dai Yi, Mu Jun, etc. User Interest Drift Method Based on Optimal Time Window [J]. Computer Engineering, 2008, 34(16), 210-214.). In the literature (SHIN H., CHO S..Neighborhood Property Based Pattern Selection for Support VectorMachines[J].Neural Computation,2007,19(3),816-855.), the time window method is used to update the user interest model. This method uses A sliding time window filters out outdated interests. Literature (KEERTHI S.S., SHEVADE S.K., BHATTACHARYYA., et al. A Fast Iterative Nearest Point Algorithm for Support Vector Machine Classifier Design[J]. IEEE Transactions on Neural Networks, 2000, 11(1), 124-136.) using forgetting The function method updates the user interest model, which uses the forgetting function to attenuate the weight of interest. Shan Rong (Shan Rong. Research on Update and Forgetting Mechanism of User Interest Model[J]. Microcomputer Application, 2011, 27(7), 10-11, 69) Update the interest model according to the characteristics of HTML documents and the user's browsing speed, Combined with the forgetting factor to modify the weight of the feature words to realize the forgetting of the model. Literature (Li Feng, Pei Jun, You Zhiyang. Adaptive User Interest Model Based on Implicit Feedback [J]. Computer Engineering and Applications, 2008, 44(9), 76-79.) divides user interest into short-term interest and Long-term interest, short-term interest uses a time window update mechanism, and long-term interest uses a time-based forgetting function update strategy.

现有的用户兴趣模型更新方法强调的是如何从用户感兴趣的文档当中剔除偏离用户兴趣的文档，以及增加新的感兴趣文档，使得用于构建用户兴趣模型的文档更能反映用户当前兴趣，而忽略了用户兴趣模型更新的计算效率问题。随着用户阅读文档数量的增加，其标记的感兴趣的文档数量也会增加，用户兴趣模型更新的计算效率问题逐渐凸显出来，造成模型更新速度过低而不能满足用户需求的不良后果。The existing user interest model update method emphasizes how to remove documents that deviate from the user's interest from the documents that the user is interested in, and add new interested documents, so that the documents used to build the user interest model can better reflect the user's current interests. However, the calculation efficiency problem of user interest model update is ignored. As the number of documents read by users increases, the number of documents of interest marked by them will also increase, and the computational efficiency of user interest model updates is gradually highlighted, resulting in the adverse consequences that the model update speed is too low to meet user needs.

发明内容Contents of the invention

本发明所要解决的技术问题是，针对现有技术不足，提供一种个性化推荐系统的用户兴趣模型增量更新方法，在确保更新过程不丢失兴趣信息的前提下，提高用户兴趣模型更新的计算效率，满足用户兴趣模型在数据量庞大的情况下也能不断快速更新的要求，提高个性化推荐系统性能，为用户提供更高质量的服务。The technical problem to be solved by the present invention is to provide a user interest model incremental update method for a personalized recommendation system to improve the update calculation of the user interest model on the premise of ensuring that the update process does not lose interest information. Efficiency, meet the requirement that the user interest model can be continuously and rapidly updated in the case of a huge amount of data, improve the performance of the personalized recommendation system, and provide users with higher quality services.

为解决上述技术问题，本发明所采用的技术方案是：一种个性化推荐系统的用户兴趣模型增量更新方法，该方法为：In order to solve the above-mentioned technical problems, the technical solution adopted in the present invention is: a method for incrementally updating the user interest model of a personalized recommendation system, the method is:

1)构建基于文档内容的用户兴趣向量空间模型U₀；1) Construct a user interest vector space model U ₀ based on document content;

2)建立所述用户兴趣向量空间模型U₀的用户感兴趣文档集D₀={d₀₁,d₀₂,...,d_0m}，令D={d₁,d₂,...,d_n}为待推荐文档集，其中文档d_i的特征向量为,(t_i2,w_i2),...,(t_ia,w_ia)}；其中，d_0e表示所述用户感兴趣文档集D₀中的文档，e=1,2,...,m，m为所述用户感兴趣文档集D₀中的文档总数；t_ik表示文档di第k项特征词；w_ik表示文档d_i第k项特征词的权重；i=1,2,...,n；k=1,2,...,a；a表示文档d_i特征词的总项数；这里，待推荐文档集一般从网络搜集得到或者从文献资料中得到；2) Establishing the user interested document set D ₀ ={d ₀₁ ,d ₀₂ ,...,d _0m } of the user interest vector space model U ₀ , let D={d ₁ ,d ₂ ,..., d _n } is the document set to be recommended, where the feature vector of document d _i is ,(t _i2 ,w _i2 ),...,(t _ia ,w _ia )}; wherein, d _0e represents the documents in the document set D ₀ that the user is interested in, e=1,2,..., m, m is the total number of documents in the document set D ₀ that the user is interested in; t _ik represents the feature word of item k of document di; w _ik represents the weight of feature word of item k of document d _i ; i=1,2,. ...,n; k=1,2,...,a; a represents the total number of items of feature words in document d _i ; here, the document set to be recommended is generally collected from the Internet or obtained from literature;

3)推荐文档时，计算所述待推荐文档集D中所有文档特征向量与所述用户兴趣向量空间模型U₀的相似度r，推荐出相似度r大于阈值α的文档，向个性化推荐系统反馈感兴趣的新文档，所述新文档集合为

阈值α的取值范围为0到1之间，根据用户需要调节α大小，当用户希望得到更多推荐结果时，α的取值越接近0，当用户希望得到更准确的推荐结果时，α的取值越接近1；选择用户感兴趣文档集合中过时或者偏离用户兴趣的文档时，分别计算集合D₀中各个文档特征向量与所述用户兴趣向量空间模型U₀的相似度r'，选择r'小于阈值α的文档作为过时或者偏离用户兴趣的文档，所述过时或者偏离用户兴趣的文档集合为

为所述新文档集合为D'中的文档，f=1,2,...,q，q为所述新文档集合D'中的文档总数；为所述过时或者偏离用户兴趣的文档集合D''中的文档，h=1,2,...,c，c为所述过时或者偏离用户兴趣的文档集合D''中的文档总数；3) When recommending documents, calculate the similarity r of all document feature vectors in the document set D to be recommended and the user interest vector space model _U0 , recommend documents with similarity r greater than the threshold α, and send them to the personalized recommendation system Feedback new documents of interest, the collection of new documents is

The value range of the threshold α is between 0 and 1, and the size of α is adjusted according to the needs of the user. When the user wants to get more recommendation results, the value of α is closer to 0. When the user wants to get more accurate recommendation results, α The closer the value is to 1; when selecting documents that are outdated or deviate from the user's interest in the user's interest document collection, calculate the similarity r' between each document feature vector in the collection D ₀ and the user interest vector space model U ₀ , and select Documents whose r' is less than the threshold α are regarded as outdated or deviated from the user's interest, and the set of outdated or deviated from the user's interest is

The new document set is the document in D', f=1,2,...,q, q is the total number of documents in the new document set D'; is the document in the outdated or deviated from the user's interest document collection D'', h=1,2,...,c, c is the total number of documents in the outdated or deviated from the user's interest in the document collection D'';

4)增加用户感兴趣文档集合时，将所述新文档集合D'添加到所述用户感兴趣文档集D₀中，构成新的第一用户感兴趣文档集D₁；剔除用户感兴趣文档集合中过时或者偏离用户兴趣的文档时，将所述过时或者偏离用户兴趣的文档集合D''从所述用户感兴趣文档集D₀中剔除，构成新的第二用户感兴趣文档集D₂；4) When increasing the document collection of interest to the user, add the new document collection D' to the document collection D ₀ of interest to the user to form a new first document collection D ₁ of interest to the user; remove the document collection of interest to the user When outdated or deviating from the documents of the user's interest, the outdated or deviating from the user's interest document set D'' is removed from the user's interested document set _D0 to form a new second user's interested document set _D2 ;

5)根据下式计算所述新的第一用户感兴趣文档集D₁的中心向量

5) Calculate the center vector of the new first user interested document set _D1 according to the following formula

$\overset{&OverBar; &OverBar;}{{W W}_{{D D.}_{11}}} = = \frac{{Σ Σ}_{e e = = 11}^{m m} {W W}_{{d d}_{00 e e}} + + {Σ Σ}_{f f = = 11}^{q q} {W W}_{{d d}_{pf pf}}}{m m + + q q} = = \frac{m m \overset{&OverBar; &OverBar;}{{W W}_{{D D.}_{00}}} + + {Σ Σ}_{f f = = 11}^{q q} {W W}_{{d d}_{pf pf}}}{m m + + q q};;$

其中，

为所述用户感兴趣文档集D₀中第e个文档的特征向量；

为所述新文档集合D'中第f个文档的特征向量；q为所述新文档集合D'中的文档总数；

为所述用户感兴趣文档集D₀的中心向量；m为所述用户感兴趣文档集D₀中的文档总数；e=1,2,...,m；f=1,2,...,q；in,

is the feature vector of the e-th document in the document set D ₀ that the user is interested in;

is the feature vector of the fth document in the new document collection D'; q is the total number of documents in the new document collection D';

is the center vector of the document set D ₀ of interest to the user; m is the total number of documents in the document set D ₀ of interest to the user; e=1,2,...,m; f=1,2,... .,q;

根据下式计算新的第二用户感兴趣文档集D₂的中心向量

Calculate the center vector of the new second user's interested document set _D2 according to the following formula

$\overset{&OverBar; &OverBar;}{{W W}_{{D D.}_{22}}} = = \frac{{Σ Σ}_{e e = = 11}^{m m} {W W}_{{d d}_{00 e e}} - - {Σ Σ}_{h h = = 11}^{c c} {W W}_{{d d}_{bh bh}}}{m m - - c c} = = \frac{m m \overset{&OverBar; &OverBar;}{{W W}_{{D D.}_{00}}} - - {Σ Σ}_{h h = = 11}^{c c} {W W}_{{d d}_{bh bh}}}{m m - - c c};;$

其中，

为所述用户感兴趣文档集D₀中第h个文档的特征向量；

为过时或者偏离用户兴趣的文档集合D''中文档的特征向量；c为过时或者偏离用户兴趣的文档集合D''中文档总数；

为所述用户感兴趣文档集D₀的中心向量；m为所述用户感兴趣文档集D₀中文档总数；h=1,2,...,c；in,

is the feature vector of the hth document in the document set D ₀ that the user is interested in;

is the feature vector of the document in the document collection D'' that is outdated or deviates from the user's interest; c is the total number of documents in the document collection D'' that is outdated or deviates from the user's interest;

is the center vector of the document set D ₀ of interest to the user; m is the total number of documents in the document set D ₀ of interest to the user; h=1,2,...,c;

6)将

或

各维按权值从大到小排序，选择

或

的前N维构建新的用户兴趣向量空间模型U₁或U₂，同时把

或

存入个性化推荐系统；其中，N不超过

或

的维数；用所述新的用户兴趣向量空间模型U₁或U₂代替步骤1）中的U₀进行新一轮推荐。6) Will

or

Dimensions are sorted by weight from large to small, select

or

The first N dimensions construct a new user interest vector space model U ₁ or U ₂ , and at the same time put

or

Stored in the personalized recommendation system; among them, N does not exceed

or

Dimensions; use the new user interest vector space model U ₁ or U ₂ to replace U ₀ in step 1) for a new round of recommendation.

所述步骤1）中，构建基于文档内容的用户兴趣向量空间模型U₀的具体步骤如下：In the step 1), the specific steps of constructing the user interest vector space model U ₀ based on document content are as follows:

1)对所有用户感兴趣的文档进行特征词选择及特征词权重计算；文档特征词选择及特征词权重可以由ICTCLAS汉语分词软件（http://ictclas.nlpir.org/）的关键词提取功能获得，或基于词频的特征词选择方法得到；1) Feature word selection and feature word weight calculation for all documents that users are interested in; document feature word selection and feature word weight can be extracted by the keyword extraction function of ICTCLAS Chinese word segmentation software (http://ictclas.nlpir.org/) Obtain, or obtain based on the feature word selection method of word frequency;

2)提取所有用户感兴趣的文档的特征向量，构成文档特征向量集D₃；2) extract the feature vectors of all documents that the user is interested in, and form the document feature vector set _D3 ;

3)计算所述文档特征向量集D₃的中心向量，将所述文档特征向量集D₃的中心向量按各维的权重从大到小排序，选取前M维作为用户兴趣向量空间模型U₀；其中M不超过所述文档特征向量集D₃的中心向量的维数。3) Calculate the center vector of the document feature vector set _D3 , sort the center vector of the document feature vector set _D3 according to the weight of each dimension from large to small, and select the first M dimensions as the user interest vector space model _U0 ; where M does not exceed the dimension of the center vector of the document feature vector set D ₃ .

文档特征向量集D₃={d₃₁,d₃₂,...,d_3x}的中心向量

的计算公式为：Document feature vector set D ₃ =center vector of {d ₃₁ ,d ₃₂ ,...,d _3x }

The calculation formula is:

$\overset{&OverBar; &OverBar;}{{W W}_{{D D.}_{33}}} = = \frac{{Σ Σ}_{y the y = = 11}^{x x} {W W}_{{d d}_{33 y the y}}}{x x};;$

其中，x为所述文档特征向量集D₃中元素的个数；

为所述文档特征向量集D₃中第y个文档的特征向量；y=1,2,...,x。Wherein, x is the number of elements in the document feature vector set _D3 ;

is the feature vector of the yth document in the document feature vector set _D3 ; y=1,2,...,x.

待推荐文档集D中文档d_i的特征向量与所述用户兴趣向量空间模型U₀的相似度r的计算公式为：The calculation formula of the similarity r between the feature vector of the document d _i in the document set D to be recommended and the user interest vector space model _U0 is:

$r = \cos (W_{d_{i}}, U_{0}) = \frac{W_{d_{i}} \cdot U_{0}}{{| | W_{d_{i}} | |}_{2} \times {| | U_{0} | |}_{2}};$ 其中，||||₂表示二范数。 $r = \cos (W_{d_{i}}, u_{0}) = \frac{W_{d_{i}} &Center Dot; u_{0}}{{| | W_{d_{i}} | |}_{2} \times {| | u_{0} | |}_{2}};$ Among them, |||| ₂ represents the two-norm.

用户感兴趣文档集D₀中第e个文档特征向量与所述用户兴趣向量空间模型U₀的相似度r'的计算公式为：The calculation formula of the similarity r' between the e-th document feature vector in the user interest document set _D0 and the user interest vector space model _U0 is:

${r r}^{' '} = = cos cos (({W W}_{{d d}_{00 e e}},, {U u}_{00})) = = \frac{{W W}_{{d d}_{00 e e}} \cdot \cdot {U u}_{00}}{{| | | | {W W}_{{d d}_{00 e e}} | | | |}_{22} \times \times {| | | | {U u}_{00} | | | |}_{22}} . .$

本发明提出的用户兴趣模型增量更新方法的基本思想是存储生成当前用户兴趣模型的计算过程中的中间结果，更新用户兴趣模型时，在该中间结果基础上进行增量计算。The basic idea of the user interest model incremental update method proposed by the present invention is to store the intermediate results in the calculation process of generating the current user interest model, and perform incremental calculations on the basis of the intermediate results when updating the user interest model.

与现有技术相比，本发明所具有的有益效果为：本发明针对基于文档内容信息的推荐系统的用户兴趣模型更新的效率问题，在保证用户信息完整的前提下，本发明的更新方法减少了用户兴趣模型更新时的计算量，使得用户兴趣模型可以快速频繁更新，提高了个性化推荐系统的性能，能够快速实现用户兴趣跟踪，以适应用户兴趣的变化，为用户提供更高质量的服务。Compared with the prior art, the beneficial effects of the present invention are as follows: the present invention aims at updating the efficiency of the user interest model of the recommendation system based on document content information, and on the premise of ensuring the integrity of user information, the updating method of the present invention reduces Reduce the amount of calculation when updating the user interest model, so that the user interest model can be updated quickly and frequently, improve the performance of the personalized recommendation system, and quickly realize user interest tracking to adapt to changes in user interest and provide users with higher quality services .

附图说明Description of drawings

图1为基于文档内容信息的推荐系统；Figure 1 is a recommendation system based on document content information;

图2为本发明用户兴趣模型的构建流程。Fig. 2 is the construction process of the user interest model of the present invention.

具体实施方式Detailed ways

本发明中构建基于文档内容的用户兴趣向量空间模型的流程如图1所示，首先对用户感兴趣的文档进行特征词选择及特征词权重计算，得到一个由一组特征词及其权重组成的文档特征向量。文档特征向量提取方法可以利用ICTCLAS汉语分词软件（http://ictclas.nlpir.org/）的特征词提取功能，或基于词频的特征词选择方法得到。多个文档特征向量构成文档特征向量集。计算得到文档特征向量集的中心向量之后，将中心向量各维按权重从大到小排序，选取前N维作为该用户的兴趣模型向量。In the present invention, the process of constructing a user interest vector space model based on document content is shown in Figure 1. First, the feature word selection and feature word weight calculation are performed on the document that the user is interested in, and a set of feature words and their weights is obtained. Document feature vector. The document feature vector extraction method can be obtained by using the feature word extraction function of ICTCLAS Chinese word segmentation software (http://ictclas.nlpir.org/), or the feature word selection method based on word frequency. Multiple document feature vectors constitute a document feature vector set. After calculating the center vector of the document feature vector set, sort the dimensions of the center vector in descending order of weight, and select the first N dimensions as the user's interest model vector.

文档特征向量集的中心向量计算方法如下：The calculation method of the center vector of the document feature vector set is as follows:

文档集合D₃={d₃₁,d₃₂,...,d_3x}，文档d_2i的特征向量为

,(t_3i2,w_3i2),...,(t_3im,w_3im)}，其中，t_3ik表示文档d_3i第k项特征词，w_3ik表示文档d_3i第k项特征词的权重，那么中心向量

计算公式为：Document collection D ₃ ={d ₃₁ ,d ₃₂ ,...,d _3x }, the feature vector of document d _2i is

,(t _3i2 ,w _3i2 ),...,(t _3im ,w _3im )}, where, t _3ik represents the feature word of item k of document d _3i , w _3ik represents the weight of feature word of item k of document d _3i , Then the center vector

The calculation formula is:

$\overset{&OverBar; &OverBar;}{{W W}_{{D D.}_{33}}} = = \frac{{Σ Σ}_{y the y = = 11}^{x x} {W W}_{{d d}_{33 y the y}}}{x x} - - - - - - ((11))$

在此公式中，文档特征向量通过匹配每一维的特征词来求和，特征词相同则对应权值相加。该中心向量各维按权重排序后的前M项即为该用户的兴趣模型U，M不超过中心向量的维数，一般由训练样本经验值决定。In this formula, the document feature vectors are summed by matching the feature words of each dimension, and the corresponding weights are added if the feature words are the same. The top M items of each dimension of the center vector sorted by weight are the user's interest model U, and M does not exceed the dimension of the center vector, which is generally determined by the experience value of the training samples.

假设用户感兴趣文档为{d₁,d₂,d₃}，建立用户兴趣模型的过程见表1。Assuming that the user's interested document is {d ₁ ,d ₂ ,d ₃ }, the process of establishing the user interest model is shown in Table 1.

表1用户兴趣模型建立过程Table 1 User interest model building process

表中中心向量

由公式(1)计算所得，此处选择该中心向量的前5个特征项作为用户兴趣模型U。center vector in table

Calculated by the formula (1), the first 5 feature items of the center vector are selected as the user interest model U here.

本发明提出的增量更新方法的具体实现步骤如下：The specific implementation steps of the incremental update method proposed by the present invention are as follows:

设U₀为用户当前已经建立的用户兴趣模型，建立该用户兴趣模型的用户感兴趣文档集为D₀={d₀₁,d₀₂,...,d_0m}。文档集合D={d₁,d₂,...,d_n}为待推荐文档，文档d_i的特征向量为

,(t_i2,w_i2),...,(t_ia,w_ia)}。Let U ₀ be the user interest model currently established by the user, and the user interested document set for establishing the user interest model is D ₀ ={d ₀₁ ,d ₀₂ ,...,d _0m }. The document set D={d ₁ ,d ₂ ,...,d _n } is the document to be recommended, and the feature vector of the document d _i is

,(t _i2 ,w _i2 ),...,(t _ia ,w _ia )}.

(1)推荐文档时，通过余弦夹角公式计算集合D中所有文档特征向量与用户模型U₀的相似度r，推荐出相似度r大于阈值α的文档，用户浏览后向系统反馈感兴趣的新文档，设该文档集合为

选择用户感兴趣文档集合中过时或者偏离用户兴趣的文档时，分别计算集合D₀中各个文档特征向量与所述用户兴趣向量空间模型U₀的相似度r'，选择r'小于阈值α的文档作为过时或者偏离用户兴趣的文档，所述过时或者偏离用户兴趣的文档集合为

D^{''} = {d_{b_{1}}, d_{b_{2}}, . . ., d_{b_{c}}};

(1) When recommending documents, the similarity r of all document feature vectors in the set D and the user model U ₀ is calculated by the cosine angle formula, and documents with similarity r greater than the threshold α are recommended, and the user feeds back the interested ones to the system after browsing new document, let the document collection be

When selecting documents that are outdated or deviate from the user's interest in the document collection of user interest, calculate the similarity r' between each document feature vector in the set D ₀ and the user interest vector space model U ₀ , and select the document whose r' is less than the threshold α As documents that are outdated or deviate from user interests, the collection of documents that are outdated or deviate from user interests is

{D.}^{''} = {d_{b_{1}}, d_{b_{2}}, . . ., d_{b_{c}}};

(2)增加用户感兴趣文档集合时，将所述新文档集合D'添加到所述用户感兴趣文档集D₀中，构成新的用户感兴趣文档集D₁；剔除用户感兴趣文档集合中过时或者偏离用户兴趣的文档时，将所述过时或者偏离用户兴趣的文档集合D''从所述用户感兴趣文档集D₀中剔除，构成新的用户感兴趣文档集D₂；(2) When increasing _{the document collection of interest to the user, add the new document collection D' to the document collection D of interest to the user to form a new document collection D 1} _of interest to the user; remove the document collection of interest to the user When documents that are outdated or deviate from the user's interest, the outdated or document set D'' that deviates from the user's interest is removed from the user-interested document set D ₀ to form a new user-interested document set D ₂ ;

(3)为了完整的保留用户兴趣，避免重复计算，提高算法性能，系统已经预先存储了计算用户兴趣模型U₀时文档集合D₀的中心向量将公式(1)变形为公式(2)计算增加新文档后的新的兴趣模型的中心向量：(3) In order to fully retain user interests, avoid repeated calculations, and improve algorithm performance, the system has pre-stored the center vector of the document collection D ₀ when calculating the user interest model U ₀ Transform formula (1) into formula (2) to calculate the center vector of the new interest model after adding new documents:

$\overset{&OverBar; &OverBar;}{{W W}_{{D D.}_{11}}} = = \frac{{Σ Σ}_{e e = = 11}^{m m} {W W}_{{d d}_{00 e e}} + + {Σ Σ}_{f f = = 11}^{q q} {W W}_{{d d}_{pf pf}}}{m m + + q q} = = \frac{m m \overset{&OverBar; &OverBar;}{{W W}_{{D D.}_{00}}} + + {Σ Σ}_{f f = = 11}^{q q} {W W}_{{d d}_{pf pf}}}{m m + + q q} - - - - - - ((22))$

将公式(2)变形为公式(3)计算剔除过时或者偏离用户兴趣的文档后的新的兴趣模型的中心向量：Transform formula (2) into formula (3) to calculate the center vector of the new interest model after excluding documents that are outdated or deviate from the user's interest:

$\overset{&OverBar; &OverBar;}{{W W}_{{D D.}_{22}}} = = \frac{{Σ Σ}_{e e = = 11}^{m m} {W W}_{{d d}_{00 e e}} - - {Σ Σ}_{h h = = 11}^{c c} {W W}_{{d d}_{bh bh}}}{m m - - c c} = = \frac{m m \overset{&OverBar; &OverBar;}{{W W}_{{D D.}_{00}}} - - {Σ Σ}_{h h = = 11}^{c c} {W W}_{{d d}_{bh bh}}}{m m - - c c} - - - - - - ((33))$

(4)将各维按权值从大到小排序，选择前N维构建新的用户兴趣模型U₁(U₂)，同时把

)存入系统。用得到的新用户兴趣模型U₁(U₂)代替步骤(1)中的U₀进行新一阶段推荐。(4) will Each dimension is sorted from large to small according to the weight, and the first N dimensions are selected to construct a new user interest model U ₁ (U ₂ ), and the

) into the system. Use the obtained new user interest model U ₁ (U ₂ ) to replace U ₀ in step (1) for a new stage of recommendation.

从公式（2）和公式（3）可以看出，中心向量

都出现在这两个公式中。中心向量

是前一次计算用户兴趣模型的一个中间结果，本发明的核心就是每次更新用户兴趣模型时都保存该中心向量

使得下一次更新时不需要重新计算该部分内容，从而提高更新效率。From formula (2) and formula (3), it can be seen that the center vector

appear in both formulas. center vector

is an intermediate result of the previous calculation of the user interest model, and the core of the present invention is to save the center vector every time the user interest model is updated

This makes it unnecessary to recalculate this part of the content in the next update, thereby improving the update efficiency.

以表2中的例子为例，对表2所述的用户兴趣模型在增加文档d₄更新时，设d₄={{汽车,4.0},{保险,3.6},{国产,2.5},{涨幅,2.0}}，在中心向量

的基础上更新，对于特征词“汽车”，其权值w₁计算如式(4)所示，Taking the example in Table 2 as an example, when adding document d ₄ to update the user interest model described in Table 2, set d ₄ ={{automobile,4.0},{insurance,3.6},{domestic,2.5},{ gain, 2.0}}, at the center vector

is updated on the basis of , for the feature word "car", its weight w ₁ is calculated as shown in formula (4),

${w w}_{11} = = \frac{3.2 3.2 * * 33 + + 4.0 4.0}{33 + + 11} = = 3.4 3.4 - - - - - - ((44))$

剔除文档d₁更新时，对于特征词“汽车”，其权值w₂计算如式(5)所示，When the document d ₁ is deleted and updated, for the feature word "car", its weight w ₂ is calculated as shown in formula (5),

${w w}_{22} = = \frac{3.2 3.2 * * 33 - - 5.3 5.3}{33 - - 11} = = 2.15 2.15 - - - - - - ((55))$

以此类推得到新的用户兴趣模型中心向量，更新结果见表2。该示例仅在用户感兴趣文档数为3的基础上进行增量计算，所以在本示例中计算效率提高并不明显。本示例仅用来说明增量更新算法。实际应用中，用户标记的感兴趣文档数量会比较多，而增加或提出的文档数相对较少，这时候增量更新算法的效率会更为明显。By analogy, the new center vector of the user interest model is obtained, and the update results are shown in Table 2. This example only performs incremental calculations based on the fact that the number of documents the user is interested in is 3, so the increase in calculation efficiency is not obvious in this example. This example is only used to illustrate the incremental update algorithm. In practical applications, the number of interested documents marked by users will be relatively large, and the number of added or proposed documents will be relatively small. At this time, the efficiency of the incremental update algorithm will be more obvious.

对比表1中的用户兴趣模型提取和和表2中本发明提出的增量更新过程，可以发现，中心向量

作为上一次用户兴趣模型创建或更新过程中的一个中间结果，本发明在该中间结果的基础上进行增量更新，从而避免了大量的向量求和工作；并且可以看出，本发明提出的增量更新方法得到的中心向量与直接从更新后的文档集合中提取的相同。一般来说，用于构建新的用户兴趣模型的文档有两部分构成，第一部分是新增加的感兴趣的文档；第二部分是原有的感兴趣文档中剔除偏离当前用户兴趣的文档后剩下的部分，而这部分文档数量占绝大多数。本发明提出的增量更新方式的意义在于避免了第二部分文档的重复计算工作，从而有效降低用户兴趣模型更新计算量。Comparing the user interest model extraction in Table 1 and the incremental update process proposed by the present invention in Table 2, it can be found that the center vector

As an intermediate result in the last user interest model creation or update process, the present invention performs incremental update on the basis of the intermediate result, thereby avoiding a large amount of vector summation work; and it can be seen that the incremental update proposed by the present invention The center vectors obtained by the volume update method are the same as those extracted directly from the updated document collection. Generally speaking, the documents used to build a new user interest model are composed of two parts. The first part is newly added interested documents; The following part, and the number of documents in this part accounts for the vast majority. The significance of the incremental update method proposed by the present invention is to avoid the repeated calculation work of the second part of the document, thereby effectively reducing the calculation amount of updating the user interest model.

Claims

1. A user interest model incremental updating method of a personalized recommendation system is characterized by comprising the following steps:

1) user interest vector space model U based on document content is constructed₀；

2) Establishing the user interest vector space model U₀User interest document set D₀={d₀₁,d₀₂,...,d_0mLet D = { D }₁,d₂,...,d_nIs a set of documents to be recommended, wherein the document d_iThe feature vector of

,(t_i2,w_i2),...,(t_ia,w_ia) }; wherein d is_0eRepresenting the set of documents of interest to the user D₀The document in (1), e =1,2, ·, m, m is the document set D of interest to the user₀Total number of documents in; t is t_ikRepresenting a document d_iThe kth term feature word; w is a_ikRepresenting a document d_iThe weight of the kth characteristic word; i =1,2,. n; k =1,2,. a; a represents a document d_iThe total number of terms of the feature words;

3) when the document is recommended, calculating all document feature vectors in the document set D to be recommended and the user interest vector space model U₀Recommending the document with the similarity r larger than a threshold value alpha, and feeding back an interested new document to a personalized recommendation system, wherein the new document set is

(ii) a When the outdated documents or the documents deviating from the user interest in the user interest document set are selected, respectively calculating the user interest document set D₀Each document feature vector and the user interest vector space model U₀Selecting the documents with r' smaller than the threshold value alpha as the documents with outdated or deviated user interest, wherein the set of the documents with outdated or deviated user interest is the documents with the outdated or deviated user interest

The value range of the threshold alpha is 0-1;

for a document in D 'that is the new document set, f =1, 2., q, q is the total number of documents in D' that is the new document set;

for the documents in the set of documents D ″ that are out of date or that deviate from the user's interest, h =1,2The total number of documents in the set of documents D ' ' that are outdated or that deviate from the user's interests;

4) when the user interested document set is added, the new document set D' is added to the user interested document set D₀In (2), a new first set of user-interesting documents D is formed₁(ii) a Or when the documents which are outdated or deviate from the user interest in the user interest document set D are removed, the outdated or the user interest-deviating document set D '' is removed from the user interest document set D₀Removing to form a new second user interested document set D₂；

5) Calculating the new first set of user-interesting documents D according to₁Central vector of

<math> <mrow> <mover> <msub> <mi>W</mi> <msub> <mi>D</mi> <mn>1</mn> </msub> </msub> <mo>&OverBar;</mo> </mover> <mo>=</mo> <mfrac> <mrow> <munderover> <mi>Σ</mi> <mrow> <mi>e</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>m</mi> </munderover> <msub> <mi>W</mi> <msub> <mi>d</mi> <mrow> <mn>0</mn> <mi>e</mi> </mrow> </msub> </msub> <mo>+</mo> <munderover> <mi>Σ</mi> <mrow> <mi>f</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>q</mi> </munderover> <msub> <mi>W</mi> <msub> <mi>d</mi> <mi>pf</mi> </msub> </msub> </mrow> <mrow> <mi>m</mi> <mo>+</mo> <mi>q</mi> </mrow> </mfrac> <mo>=</mo> <mfrac> <mrow> <mi>m</mi> <mover> <msub> <mi>W</mi> <msub> <mi>D</mi> <mn>0</mn> </msub> </msub> <mo>&OverBar;</mo> </mover> <mo>+</mo> <munderover> <mi>Σ</mi> <mrow> <mi>f</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>q</mi> </munderover> <msub> <mi>W</mi> <msub> <mi>d</mi> <mi>pf</mi> </msub> </msub> </mrow> <mrow> <mi>m</mi> <mo>+</mo> <mi>q</mi> </mrow> </mfrac> <mo>;</mo> </mrow> </math>

Wherein,

set D of documents of interest to said user₀The feature vector of the e-th document;the feature vector of the f document in the new document set D';

set D of documents of interest to said user₀A center vector of (d);

calculating the new second set of user-interesting documents D according to₂Central vector of

<math> <mrow> <mover> <msub> <mi>W</mi> <msub> <mi>D</mi> <mn>2</mn> </msub> </msub> <mo>&OverBar;</mo> </mover> <mo>=</mo> <mfrac> <mrow> <munderover> <mi>Σ</mi> <mrow> <mi>e</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>m</mi> </munderover> <msub> <mi>W</mi> <msub> <mi>d</mi> <mrow> <mn>0</mn> <mi>e</mi> </mrow> </msub> </msub> <mo>-</mo> <munderover> <mi>Σ</mi> <mrow> <mi>h</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>c</mi> </munderover> <msub> <mi>W</mi> <msub> <mi>d</mi> <mi>bh</mi> </msub> </msub> </mrow> <mrow> <mi>m</mi> <mo>-</mo> <mi>c</mi> </mrow> </mfrac> <mo>=</mo> <mfrac> <mrow> <mi>m</mi> <mover> <msub> <mi>W</mi> <msub> <mi>D</mi> <mn>0</mn> </msub> </msub> <mo>&OverBar;</mo> </mover> <mo>-</mo> <munderover> <mi>Σ</mi> <mrow> <mi>h</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>c</mi> </munderover> <msub> <mi>W</mi> <msub> <mi>d</mi> <mi>bh</mi> </msub> </msub> </mrow> <mrow> <mi>m</mi> <mo>-</mo> <mi>c</mi> </mrow> </mfrac> <mo>;</mo> </mrow> </math>

Wherein,

a feature vector of the h document in the document set D ' ' that is outdated or deviating from the user's interest;

6) will be provided with

Or

All dimensions are sorted from large to small according to the weight value and selectedOr

The first N dimension of the user interest vector space model U is constructed₁Or U₂At the same time handle

Or

Storing the information into a personalized recommendation system; wherein N is not more than

Or

The dimension of (a); using the new user interest vector space model U₁Or U₂Replacement procedure

1) In (1) U₀A new round of recommendation is made.

2. The method for incrementally updating the user interest model of the personalized recommendation system according to claim 1, wherein in the step 1), a user interest vector space model U based on document contents is constructed₀The method comprises the following specific steps:

1) performing feature word selection and feature word weight calculation on all documents in which the user is interested;

2) extracting the feature vectors of all the documents which are interested by the user to form a document feature vector set D₃；

3) Computing the document feature vectorCollection D₃The center vector of (2), the document feature vector set D₃The central vectors are sorted from large to small according to the weight of each dimension, and the top M dimensions are selected as a user interest vector space model U₀(ii) a Wherein M does not exceed the document feature vector set D₃The dimension of the central vector.

3. The method of claim 2, wherein the document feature vector set D is a set of document feature vectors₃={d₃₁,d₃₂,...,d_3xCentral vector of }

The calculation formula of (2) is as follows:

<math> <mrow> <mover> <msub> <mi>W</mi> <msub> <mi>D</mi> <mn>3</mn> </msub> </msub> <mo>&OverBar;</mo> </mover> <mo>=</mo> <mfrac> <mrow> <munderover> <mi>Σ</mi> <mrow> <mi>y</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>x</mi> </munderover> <msub> <mi>W</mi> <msub> <mi>d</mi> <mrow> <mn>3</mn> <mi>y</mi> </mrow> </msub> </msub> </mrow> <mi>x</mi> </mfrac> <mo>;</mo> </mrow> </math>

wherein x is the document feature vector set D₃The number of middle elements;

set D of feature vectors for said document₃The feature vector of the y-th document; y =1, 2.

4. The method for incrementally updating the user interest model of the personalized recommendation system according to any one of claims 1 to 3, wherein the document D in the document set D to be recommended is_iFeature vector and the user interest vector space model U₀The calculation formula of the similarity r is as follows:

wherein | | | purple hair₂Representing a two-norm.

5. The method for incrementally updating the user interest model of the personalized recommendation system as recited in claim 4, wherein the set of documents D of interest of the user₀The e-th document feature vector and the user interest vector space model U₀The calculation formula of the similarity r' is as follows: