CN104572962A

CN104572962A - APP (Application) recommendation method and system

Info

Publication number: CN104572962A
Application number: CN201410850061.7A
Authority: CN
Inventors: 吴健; 邱奇波; 陈亮; 邓水光; 李莹; 尹建伟; 吴朝晖
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2014-12-31
Filing date: 2014-12-31
Publication date: 2015-04-29

Abstract

The present invention is applicable to the field of information technology, and provides a method and system for APP recommendation. The method includes: acquiring and recording the behavior log of the user, and the behavior log includes: the user downloads the APP record and is used to browse the APP record; The behavior log generates a user behavior matrix; according to a preset recommendation algorithm, the user behavior matrix is used to calculate the user's expectation of downloading an APP; and the APP with the high expectation is recommended to the user. In the embodiment of the present invention, the user's behavior log is obtained and recorded, a user behavior matrix is generated according to the behavior log, and according to a preset recommendation algorithm, the user behavior matrix is used to calculate the user's expectation of downloading an APP, and the APP with a high expectation is recommended to the user, so that The APP platform can use accurate algorithms to accurately recommend APPs to users based on the user's APP download records and APP preview records.

Description

A method and system for APP recommendation

技术领域technical field

本发明属于信息技术领域，尤其涉及一种APP推荐的方法和系统。The invention belongs to the field of information technology, and in particular relates to an APP recommendation method and system.

背景技术Background technique

在移动应用程序(application，简称：app)极大的方便人们日常生活的今天，数量庞大的移动应用程序也带给了用户选择的困惑。用户需要有一种高效的途径来帮助他们从数以十万计的app中选择感兴趣的少部分。受到推荐系统在传统互联网领域广泛应用的启发，业界将目光投向了app的推荐。Today, when mobile applications (application, app for short) greatly facilitate people's daily life, a large number of mobile applications also bring confusion to users in choosing. Users need an efficient way to help them choose the few interesting apps from hundreds of thousands of apps. Inspired by the widespread application of recommendation systems in the traditional Internet field, the industry has turned its attention to app recommendations.

目前大部分app商店采用了分类目录和热门排行榜的方案帮助用户发现所需的app，但这并不能很好的满足不同用户的个性化需求，因为呈现在所有用户面前的都是一样的内容，而忽略了用户的性别、年龄和文化等差异。即使对于同一个用户而言，他的兴趣往往会随着时间变化。而基于关键字的搜索则建立在用户对自身需求的明确描述之上，因此在用户不能明确描述自身需求的情况下，呈现的推荐结果往往显得盲目。业界真正投入使用的个性化app推荐系统很少。At present, most app stores use categories and popular rankings to help users find the apps they need, but this cannot meet the individual needs of different users, because all users are presented with the same content , while ignoring differences such as gender, age, and culture of users. Even for the same user, his interests tend to change over time. The keyword-based search is based on the user's clear description of their own needs. Therefore, when the user cannot clearly describe their own needs, the presented recommendation results often appear blind. There are very few personalized app recommendation systems that are actually put into use in the industry.

发明内容Contents of the invention

本发明实施例的目的在于提供一种APP推荐的方法和系统，以解决现有技术没有个性化APP推荐的问题。The purpose of the embodiments of the present invention is to provide an APP recommendation method and system to solve the problem of no personalized APP recommendation in the prior art.

本发明实施例是这样实现的，一种APP推荐的方法，所述方法包括：The embodiment of the present invention is implemented in this way, an APP recommendation method, the method includes:

获取并记录用户的行为日志，所述行为日志包括：用户下载APP记录、用于浏览APP记录；Obtain and record the user's behavior log, the behavior log includes: the user downloads APP records, and uses to browse APP records;

根据所述行为日志生成用户行为矩阵；Generate a user behavior matrix according to the behavior log;

根据预设的推荐算法，使用所述用户行为矩阵计算用户下载APP的预期；According to the preset recommendation algorithm, use the user behavior matrix to calculate the expectation of the user to download the APP;

向用户推荐所述预期高的APP。The APP with high expectation is recommended to the user.

本发明实施例的另一目的在于提供一种APP推荐的系统，所述系统包括：Another object of the embodiments of the present invention is to provide an APP recommendation system, the system comprising:

行为日志获取单元，用于获取并记录用户的行为日志，所述行为日志包括：用户下载APP记录、用于浏览APP记录；The behavior log acquisition unit is used to obtain and record the user's behavior log, and the behavior log includes: the user downloads the APP record, and is used to browse the APP record;

用户行为矩阵生成单元，用于根据所述行为日志获取单元获取的行为日志生成用户行为矩阵；a user behavior matrix generating unit, configured to generate a user behavior matrix according to the behavior logs acquired by the behavior log acquisition unit;

预期计算单元，用于根据预设的推荐算法，使用所述用户行为矩阵生成单元生成的用户行为矩阵计算用户下载APP的预期；An expected calculation unit, configured to use the user behavior matrix generated by the user behavior matrix generating unit to calculate the user's expectation of downloading the APP according to a preset recommendation algorithm;

APP推荐单元，用于向用户推荐所述预期计算单元计算的预期高的APP。An APP recommendation unit, configured to recommend to the user an APP with a high expectation calculated by the expectation calculation unit.

本发明实施例，获取并记录用户的行为日志，根据行为日志生成用户行为矩阵，根据预设的推荐算法，使用用户行为矩阵计算用户下载APP的预期，向用户推荐所述预期高的APP，使得APP平台可以根据用户的APP下载记录和APP预览记录，调用准确的算法精准的向用户推荐APP。In the embodiment of the present invention, the user's behavior log is obtained and recorded, a user behavior matrix is generated according to the behavior log, and according to a preset recommendation algorithm, the user behavior matrix is used to calculate the user's expectation of downloading an APP, and the APP with a high expectation is recommended to the user, so that The APP platform can use accurate algorithms to accurately recommend APPs to users based on the user's APP download records and APP preview records.

附图说明Description of drawings

为了更清楚地说明本发明实施例中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the following will briefly introduce the accompanying drawings that need to be used in the descriptions of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only of the present invention. For some embodiments, those of ordinary skill in the art can also obtain other drawings based on these drawings without paying creative efforts.

图1是本发明实施例提供的APP推荐方法的流程图；FIG. 1 is a flowchart of an APP recommendation method provided by an embodiment of the present invention;

图2是本发明实施例提供的APP推荐系统的结构图。Fig. 2 is a structural diagram of an APP recommendation system provided by an embodiment of the present invention.

具体实施方式Detailed ways

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

为了说明本发明所述的技术方案，下面通过具体实施例来进行说明。In order to illustrate the technical solutions of the present invention, specific examples are used below to illustrate.

实施例一Embodiment one

如图1所示为本发明实施例提供的APP推荐方法的流程图，所述方法包括以下步骤：As shown in Figure 1, it is a flow chart of the APP recommendation method provided by the embodiment of the present invention, and the method includes the following steps:

步骤S101、获取并记录用户的行为日志，所述行为日志包括：用户下载APP记录、用于浏览APP记录。Step S101 , acquire and record the user's behavior log, the behavior log includes: the user downloads APP records, and uses to browse APP records.

在本发明实施例中，APP推荐系统获取用户的行为日志，并通过数据集的形式记录该行为日志，其中，数据包括但不限于：Download数据集、Browse数据集，数据具体为：In the embodiment of the present invention, the APP recommendation system obtains the user's behavior log, and records the behavior log in the form of a data set, wherein, the data includes but not limited to: Download data set, Browse data set, and the data is specifically:

1、Download数据集中的每一条记录为用户在某个时间所下载app的记录。形如(uid,aid,timestamp)，其中uid，aid分别为用户、app加密后的ID，timestamp是下载行为发生的时间，如2000-01-01；1. Each record in the Download dataset is the record of the app downloaded by the user at a certain time. The shape is (uid, aid, timestamp), where uid and aid are the encrypted IDs of the user and app respectively, and timestamp is the time when the download behavior occurs, such as 2000-01-01;

2、Browse数据集中的每一条记录为用户在某个时间所浏览app的记录。形如(uid,aid,timestamp)，其中uid，aid分别为用户、app加密后的ID，timestamp是浏览行为发生的时间，如2000-01-01。2. Each record in the Browse dataset is the record of the app that the user browsed at a certain time. The format is (uid, aid, timestamp), where uid and aid are the encrypted IDs of the user and app respectively, and timestamp is the time when the browsing behavior occurred, such as 2000-01-01.

步骤S102、根据所述行为日志生成用户行为矩阵。Step S102, generating a user behavior matrix according to the behavior log.

在本发明实施例中，APP推荐系统在获取了行为日志之后，根据该行为日志生成供后续算法使用的用户行为矩阵，该用户行为矩阵可以为0、1二值矩阵，也可以是其他的数值。In the embodiment of the present invention, after the APP recommendation system acquires the behavior log, it generates a user behavior matrix for subsequent algorithms according to the behavior log. The user behavior matrix can be a binary matrix of 0 and 1, or other values .

步骤S103、根据预设的推荐算法，使用所述用户行为矩阵计算用户下载APP的预期。Step S103 , using the user behavior matrix to calculate the user's expectation of downloading the APP according to the preset recommendation algorithm.

在本发明实施例中，APP推荐系统在获取了用户行为矩阵之后，调用预设的推荐算法，使用该用户行为矩阵计算用户下载给定的APP的预期。所述推荐算法包括：In the embodiment of the present invention, after acquiring the user behavior matrix, the APP recommendation system invokes a preset recommendation algorithm, and uses the user behavior matrix to calculate the user's expectation of downloading a given APP. The recommendation algorithm includes:

1、基于记忆的协同过滤(memory-based CF)：因数据规模过大，出于计算效率考虑采用Jaccard公式计算用户或者app之间的相似度。以基于物品的协同过滤为例：1. Memory-based collaborative filtering (memory-based CF): Due to the large data size, the Jaccard formula is used to calculate the similarity between users or apps for the sake of computational efficiency. Take item-based collaborative filtering as an example:

${s the s}_{ij ij} = = \frac{| | N N ((i i)) \cap \cap N N ((j j)) | |}{| | N N ((i i)) \cup \cup N N ((j j)) | |}$

s_ij表示app i和app j的相似度，N(i)表示所有下载过app i的用户集合。s _ij represents the similarity between app i and app j, and N(i) represents the set of all users who have downloaded app i.

对于给定的用户u和app i，可以用下式预测u下载i的可能性，这个可能性在式中用r来描述：For a given user u and app i, the following formula can be used to predict the possibility of u downloading i, which is described by r in the formula:

${r r}_{ui ui} = = \underset{j j &Element; &Element; R R ((u u))}{Σ Σ} {r r}_{uj uj} * * {s the s}_{ij ij}$

s_ij表示相似度矩阵中app i和j的相似度，r_uj表示用户行为矩阵R中对应数值信息。注意到用来衡量用户u下载app i可能性的r其实是一个可以大于1的数值。对于给定用户，计算用户u对所有还没下载的app的r_ui，根据r对这些app进行降序排序，为用户推荐前N个app，这就是Top-N推荐；s _ij represents the similarity between app i and j in the similarity matrix, and r _uj represents the corresponding numerical information in the user behavior matrix R. Note that r, which is used to measure the possibility of user u downloading app i, is actually a value that can be greater than 1. For a given user, calculate user u's r _ui for all apps that have not been downloaded, sort these apps in descending order according to r, and recommend the top N apps for the user, which is the Top-N recommendation;

2、基于时间的协同过滤(Time-Based CF)：传统的协同过滤算法将精力集中于研究如何联系用户的兴趣和物品，却忽略了时间这个重要的上下文信息。时间对于用户兴趣有着深刻的影响，表现在两个方面。2. Time-Based Collaborative Filtering (Time-Based CF): Traditional collaborative filtering algorithms focus on how to connect users' interests and items, but ignore the important contextual information of time. Time has a profound impact on user interest, manifested in two aspects.

其一，用户兴趣是随着时间变化的。比如用户有段时间主要专注于游戏类app，当他厌倦之后可能会更多地转向社交类app。再比如用户在工作期间可能会想下载辅助工作的app，而假期则倾向于休闲旅游类的app。First, user interests change over time. For example, a user mainly focuses on game apps for a period of time, and when he gets bored, he may turn to social apps more. Another example is that users may want to download apps that assist work during work, but they tend to use apps for leisure and travel during vacations.

其二，app具有时效性。比如当Angry Bird这个app游戏刚风靡的时候，几乎所有的移动设备都有安装。而现在再对这个app进行推荐已经显得没有必要，因为用户早就普遍对它十分熟悉。Second, apps are time-sensitive. For example, when the app game Angry Bird first became popular, it was installed on almost all mobile devices. And now it seems unnecessary to recommend this app, because users are generally familiar with it.

我们在商品的相似度处理中不用时间做带权，原因是比如在IBCF中，两个用户如果不是同时购买了两个商品，那么计算得到的商品相似度就会较低，而这是不合理的处理。因此，仍旧采用Jaccard公式计算用户或者app之间的相似度，不同的是，用如下式的形式来进行预测：We do not use time to carry weights in the similarity processing of products. The reason is that, for example, in IBCF, if two users do not purchase two products at the same time, the calculated product similarity will be low, which is unreasonable. processing. Therefore, the Jaccard formula is still used to calculate the similarity between users or apps. The difference is that the following formula is used for prediction:

${r r}_{ui ui} = = \underset{j j &Element; &Element; R R ((u u))}{Σ Σ} {r r}_{uj uj} * * {s the s}_{ij ij} * * \frac{11}{11 + + a a ((T T - - t t))}$

a取0.1-0.9，t表示用户下载app的时间，T表示整个训练集最后的时间，以天为单位。当a＝0.1，得到的预测评分几乎没有随时间衰退，结果类似于传统的Item-Based CF，当a＝0.9，预测评分随时间出现显著的衰退，即用户离预测时间越近的行为越会影响预测结果；a ranges from 0.1 to 0.9, t represents the time when the user downloads the app, and T represents the last time of the entire training set, in days. When a=0.1, the predicted score hardly declines over time, and the result is similar to the traditional Item-Based CF. When a=0.9, the predicted score declines significantly over time, that is, the closer the user is to the predicted time, the more likely the behavior will be. affect forecast results;

3、基于伪评分的协同过滤(Pseudo-Rating-Based CF)：经过对用户行为的研究发现，越是流行的商品，用户更加倾向于直接点击下载按钮来下载，而不去关注app的详细信息页面；而对于那些相对不流行的app，用户则更加倾向于在浏览了详细信息页面后作出下载的决定。这个研究结论符合我们的直观感受，因为那些的热门app用户早已耳熟能详，根本不需要查看详细信息页面。更有可能，用户本来就在寻找这样的app，推荐系统的推荐只是方便了用户的查找而已。因此，有必要将浏览和下载的行为与仅仅下载的行为区分开。3. Collaborative filtering based on pseudo-rating (Pseudo-Rating-Based CF): After research on user behavior, it is found that the more popular a product is, the more users tend to click the download button to download it directly, instead of paying attention to the detailed information of the app page; for those apps that are relatively unpopular, users are more inclined to make a download decision after browsing the detailed information page. This research conclusion is in line with our intuitive feeling, because those popular app users are already familiar with it, and there is no need to view the detailed information page at all. More likely, the user is already looking for such an app, and the recommendation of the recommendation system is just to facilitate the user's search. Therefore, it is necessary to distinguish the behavior of browsing and downloading from the behavior of merely downloading.

为此，此处算法使用的R矩阵不再是01二值的，而是借鉴评分的思想，用户仅仅浏览了app就在相应的矩阵单元写入1，仅仅下载了app就在相应的矩阵单元写入4，用户既下载又浏览了app，就在相应单元写入5。当然也存在其他更合理的评分方案。同时这种伪评分方案也极大的缓解了数据过于稀疏的问题。For this reason, the R matrix used by the algorithm here is no longer 01 binary, but draws on the idea of scoring. The user writes 1 in the corresponding matrix unit only after browsing the app, and writes 1 in the corresponding matrix unit only after downloading the app. Write 4, the user has downloaded and browsed the app, then write 5 in the corresponding unit. Of course, there are other more reasonable scoring schemes. At the same time, this pseudo-scoring scheme also greatly alleviates the problem of too sparse data.

由于传统领域中计算基于评分的皮尔森相似度在本发明测试的实际数据集中并不实用。没有使用皮尔森相似度是因为其本质上是刻画两个向量之间的变化趋势，这样其对于用户真正的评分向量会有比较好的度量。而如果一个用户下载了一个app而另外一个用户是浏览了这个app，其实他们还是非常相似的，而皮尔森相似度相对Jaccard在度量这种情况略显不足。所以基于上述考虑，本文最终还是采用Jaccard相似度。Since the calculation of score-based Pearson similarity in the traditional field is not practical in the actual data set tested by the present invention. The reason why Pearson similarity is not used is because it essentially describes the change trend between two vectors, so that it will have a better measure of the user's real rating vector. And if a user downloads an app and another user browses the app, they are actually very similar, and the Pearson similarity is slightly insufficient compared to Jaccard in measuring this situation. Therefore, based on the above considerations, this paper finally uses the Jaccard similarity.

因此伪评分的融入不影响相似度计算环节，它的作用主要体现在预测评分计算上。对于如下所示的公式：Therefore, the integration of pseudo-scoring does not affect the calculation of similarity, and its role is mainly reflected in the calculation of prediction scores. For a formula like this:

式中用到的r_uj的值不再是0和1，取而代之的是更精确的伪评分。而且来自伪评分矩阵的数据支持远远比0，1矩阵充分，实际的模型测试中也印证了这一点；The value of r _uj used in the formula is no longer 0 and 1, but replaced by a more accurate pseudo score. Moreover, the data support from the pseudo-scoring matrix is far more sufficient than the 0, 1 matrix, which is also confirmed in the actual model test;

4、基于奇异值分解(Singular Value Decomposition，SVD)模型：隐式因子模型作为对传统SVD模型的改进，通过将用户-物品矩阵R分解成用户因子矩阵和物品因子矩阵来实现降维。这些模型在评分预测中很常见，通过计算用户因子矩阵和物品因子矩阵相应的向量的内积来预测用户u对物品i的评分r_u,i：4. Based on the Singular Value Decomposition (SVD) model: As an improvement to the traditional SVD model, the implicit factor model achieves dimensionality reduction by decomposing the user-item matrix R into a user factor matrix and an item factor matrix. These models are very common in rating prediction, by calculating the inner product of the corresponding vectors of the user factor matrix and the item factor matrix to predict the rating r _u,i of user u for item i:

${r r}_{ui ui} = = μ μ + + {b b}_{u u} + + {b b}_{i i} + + {p p}_{u u}^{T T} * * {q q}_{i i}$

μ表示全局偏置项，即训练集中所有评分的全局平均数。在矩阵分解前，一般通过减去R矩阵行和列的平均值来去除用户偏置和物品偏置。偏置最后会加回到上述的内积来产生最后的预测。bu和bi分别表示用户偏置项和物品偏置项。SVD算法所用的矩阵R可以是二值的，也可以是基于伪评分的。具体效果视实际的数据集特性而定。μ represents the global bias term, which is the global average of all ratings in the training set. Before matrix decomposition, the user bias and item bias are generally removed by subtracting the average values of the rows and columns of the R matrix. The bias is finally added back to the inner product above to produce the final prediction. bu and bi denote user bias and item bias, respectively. The matrix R used by the SVD algorithm can be binary or based on pseudo-scoring. The specific effect depends on the actual data set characteristics.

步骤S104、向用户推荐所述预期高的APP。Step S104, recommending the APP with high expectation to the user.

作为本发明的一个可选实施例，在所述向用户推荐所述预期高的APP的步骤之前，所述方法还包括以下步骤：As an optional embodiment of the present invention, before the step of recommending the highly expected APP to the user, the method further includes the following steps:

预设APP推荐数量。The number of recommended APPs is preset.

作为本发明的另一个可选实施例，在所述向用户推荐所述预期高的APP的步骤之后，所述方法还包括以下步骤：As another optional embodiment of the present invention, after the step of recommending the highly expected APP to the user, the method further includes the following steps:

获取用户下载所述预期高的APP的准确度。The accuracy of downloading the APP with high expectation by the user is obtained.

在本发明实施例中，一般用准确率/召回率(precision/recall，pr)来度量TopN推荐的预测准确度。In the embodiment of the present invention, a precision/recall (pr) is generally used to measure the prediction accuracy of the TopN recommendation.

若令R(u)表示推荐系统展示给用户的推荐列表，而T(u)表示用户在测试集上实际的行为列表。于是推荐结果的准确率可以定义为如下所示的形式：Let R(u) represent the recommendation list shown to the user by the recommendation system, and T(u) represent the actual behavior list of the user on the test set. Then the accuracy of the recommendation result can be defined as the following form:

$Precision Precision = = \frac{{Σ Σ}_{u u &Element; &Element; U u} | | R R ((u u)) \cap \cap T T ((u u)) | |}{{Σ Σ}_{u u &Element; &Element; U u} | | R R ((u u)) | |}$

对应的召回率可以定义为如下式所示的形式：The corresponding recall rate can be defined as follows:

$Recall recall = = \frac{{Σ Σ}_{u u &Element; &Element; U u} | | R R ((u u)) \cap \cap T T ((u u)) | |}{{Σ Σ}_{u u &Element; &Element; U u} | | T T ((u u)) | |}$

TopN推荐中，每一种推荐算法根据不同的N值都有一组准确率/召回率，进而可以画出准确率召回率曲线(precision/recall curve)。In the TopN recommendation, each recommendation algorithm has a set of accuracy/recall rates according to different N values, and then the accuracy/recall rate curve (precision/recall curve) can be drawn.

需要一种综合的指标来比较每一组准确率/召回率。其中，最常见的就是F-Measure：A comprehensive metric is needed to compare precision/recall for each group. Among them, the most common is F-Measure:

${F f}_{11} = = \frac{22 PR PR}{P P + + R R}$

P，R分别是准确率和召回率，F1综合考虑了准确率和召回率，所以可以认为F1越高，推荐算法预测用户行为的能力越强。P and R are the accuracy rate and recall rate respectively. F1 comprehensively considers the accuracy rate and recall rate, so it can be considered that the higher the F1, the stronger the ability of the recommendation algorithm to predict user behavior.

业界用覆盖率(coverage)描述一个推荐系统对物品长尾的挖掘能力。覆盖率最简单的定义就是，被推荐物品占所有物品的比例，表述下：The industry uses coverage to describe the ability of a recommendation system to mine the long tail of items. The simplest definition of coverage is the ratio of recommended items to all items, expressed as follows:

$Coverage Coverage = = \frac{| | {\cup \cup}_{u u &Element; &Element; U u} R R ((u u)) | |}{| | I I | |}$

U表示系统所有用户集合，I表示系统所有物品集合，相应地R(u)表示系统为某个用户u推荐的物品集合。U represents the set of all users in the system, I represents the set of all items in the system, and correspondingly R(u) represents the set of items recommended by the system for a certain user u.

如果一个app非常热门，那么用户的下载是推荐引导的可能性很小，我们推荐系统能收到的收益就较小；但是如果一个app比较冷门，那么推给用户他接受了的话推荐系统得到的收益就更大。因此我们针对具体问题重新定义带权的准确率、召回率和F1，如下所示，从一定程度上可以看作是预测准确度和多样性的综合衡量。If an app is very popular, it is very unlikely that the user’s download will be recommended and guided, and the benefits that our recommendation system can receive will be small; but if an app is relatively unpopular, then if it is pushed to the user and he accepts it, the recommendation system will get The benefits are even greater. Therefore, we redefine the weighted accuracy, recall, and F1 for specific problems, as shown below, which can be regarded as a comprehensive measure of prediction accuracy and diversity to a certain extent.

$Precision Precision = = \frac{{Σ Σ}_{u u &Element; &Element; U u} {Σ Σ}_{i i &Element; &Element; I I} {w w}_{i i}}{{Σ Σ}_{u u &Element; &Element; U u} | | R R ((u u)) | |}$

$Recall recall = = \frac{{Σ Σ}_{u u &Element; &Element; U u} {Σ Σ}_{i i &Element; &Element; I I} {w w}_{i i}}{{Σ Σ}_{u u &Element; &Element; U u} | | T T ((u u)) | |}$

${F f}_{11} = = \frac{22 PR PR}{P P + + R R}$

I＝|R(u)∩T(u)|，R(u)表示根据训练集产生的用户推荐列表，而T(u)表示用户在测试集上实际的行为列表。所有的app都有一个对应的权重，为了对流行的app施加惩罚，确定的权重必须满足流行app权重小于冷门app。据此，本文提出权重计算如式下：I=|R(u)∩T(u)|, R(u) represents the user recommendation list generated according to the training set, and T(u) represents the user's actual behavior list on the test set. All apps have a corresponding weight. In order to impose penalties on popular apps, the determined weight must satisfy that the weight of popular apps is smaller than that of unpopular apps. Accordingly, this paper proposes the weight calculation as follows:

${w w}_{i i} = = \frac{11}{{log log}_{N N}^{C C ((i i))}}$

C(i)表示app i在训练集中出现的次数。一般取N等于2。C(i) represents the number of times app i appears in the training set. Generally, N is equal to 2.

实施例二Embodiment two

如图2所示为本发明实施例提供的APP推荐系统的结构图，为了便于说明，仅示出与本发明实施例相关的部分，包括：As shown in Figure 2, it is a structural diagram of the APP recommendation system provided by the embodiment of the present invention. For the convenience of description, only the parts related to the embodiment of the present invention are shown, including:

行为日志获取单元201，用于获取并记录用户的行为日志，所述行为日志包括：用户下载APP记录、用于浏览APP记录。The behavior log obtaining unit 201 is configured to obtain and record the user's behavior log, the behavior log includes: the user downloads APP records, and uses to browse APP records.

用户行为矩阵生成单元202，用于根据所述行为日志获取单元201获取的行为日志生成用户行为矩阵。The user behavior matrix generation unit 202 is configured to generate a user behavior matrix according to the behavior log acquired by the behavior log acquisition unit 201 .

预期计算单元203，用于根据预设的推荐算法，使用所述用户行为矩阵生成单元202生成的用户行为矩阵计算用户下载APP的预期。The expectation calculation unit 203 is configured to use the user behavior matrix generated by the user behavior matrix generation unit 202 to calculate the user's expectation of downloading an APP according to a preset recommendation algorithm.

3、基于伪评分的协同过滤(Pseudo-Rating-Based CF)：经过对用户行为的研究发现，越是流行的商品，用户更加倾向于直接点击下载按钮来下载，而不去关注app的详细信息页面；而对于那些相对不流行的app，用户则更加倾向于在浏览了详细信息页面后作出下载的决定。这个研究结论符合我们的直观感受，因为那些的热门app用户早已耳熟能详，根本不需要查看详细信息页面。更有可能，用户本来就在寻找这样的app，推荐系统的推荐只是方便了用户的查找而已。因此，有必要将浏览和下载的行为与仅仅下载的行为区分开。3. Pseudo-Rating-Based Collaborative Filtering (Pseudo-Rating-Based CF): After research on user behavior, it is found that the more popular the product, the more users tend to click the download button to download directly, without paying attention to the detailed information of the app page; for those apps that are relatively unpopular, users are more inclined to make a download decision after browsing the detailed information page. This research conclusion is in line with our intuitive feeling, because those popular app users are already familiar with it, and there is no need to view the detailed information page at all. More likely, the user is already looking for such an app, and the recommendation of the recommendation system is just to facilitate the user's search. Therefore, it is necessary to distinguish the behavior of browsing and downloading from the behavior of merely downloading.

由于传统领域中计算基于评分的皮尔森相似度在本发明测试的实际数据集中并不实用。没有使用皮尔森相似度是因为其本质上是刻画两个向量之间的变化趋势，这样其对于用户真正的评分向量会有比较好的度量。而如果一个用户下载了一个app而另外一个用户是浏览了这个app，其实他们还是非常相似的，而皮尔森相似度相对Jaccard在度量这种情况略显不足。所以基于上述考虑，本文最终还是采用Jaccard相似度。Since the calculation of score-based Pearson similarity in the traditional field is not practical in the actual data set tested by the present invention. The reason why Pearson similarity is not used is that it essentially describes the change trend between two vectors, so that it will have a better measure of the user's real rating vector. And if a user downloads an app and another user browses the app, they are actually very similar, and the Pearson similarity is slightly insufficient compared to Jaccard in measuring this situation. Therefore, based on the above considerations, this paper finally uses the Jaccard similarity.

APP推荐单元204，用于向用户推荐所述预期计算单元203计算的预期高的APP。The APP recommending unit 204 is configured to recommend, to the user, APPs with high expectations calculated by the expectation calculating unit 203 .

作为本发明的一个可选实施例，在所述APP推荐单元204推荐之前，所述系统还包括：As an optional embodiment of the present invention, before the APP recommendation unit 204 recommends, the system further includes:

APP数量预设单元205，用于预设APP推荐数量。APP quantity preset unit 205, configured to preset recommended quantity of APPs.

作为本发明的另一个可选实施例，在所述APP推荐单元204推荐之后，所述系统还包括：As another optional embodiment of the present invention, after the APP recommending unit 204 recommends, the system further includes:

下载准确度获取单元206，用于获取用户下载所述预期高的APP的准确度。The download accuracy obtaining unit 206 is configured to obtain the user's download accuracy of the expected high APP.

${F f}_{11} = = \frac{22 PR PR}{P P + + R R}$

${w w}_{i i} = = \frac{11}{{log log}_{N N}^{C C ((i i))}}$

本领域普通技术人员可以理解为上述实施例二所包括的各个单元只是按照功能逻辑进行划分的，但并不局限于上述的划分，只要能够实现相应的功能即可；另外，各功能单元的具体名称也只是为了便于相互区分，并不用于限制本发明的保护范围。Those of ordinary skill in the art can understand that the various units included in the above-mentioned embodiment 2 are only divided according to functional logic, but are not limited to the above-mentioned division, as long as the corresponding functions can be realized; in addition, the specific functions of each functional unit The names are only for the convenience of distinguishing each other, and are not used to limit the protection scope of the present invention.

本领域普通技术人员还可以理解，实现上述实施例方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成，所述的程序可以在存储于一计算机可读取存储介质中，所述的存储介质，包括ROM/RAM、磁盘、光盘等。Those of ordinary skill in the art can also understand that all or part of the steps in the method of the above embodiments can be completed by instructing related hardware through a program, and the program can be stored in a computer-readable storage medium, so The storage medium mentioned above includes ROM/RAM, magnetic disk, optical disk, etc.

以上所述仅为本发明的较佳实施例而已，并不用以限制本发明，凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等，均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention should be included in the protection of the present invention. within range.

Claims

1. a method for APP recommendation, it is characterized in that, described method comprises:

Obtain and the user behaviors log of recording user, described user behaviors log comprises: user downloads APP record, for browsing APP record;

User behavior matrix is generated according to described user behaviors log;

According to the proposed algorithm preset, described user behavior matrix computations user is used to download the expection of APP;

To the APP that user recommends described expection high.

2. the method for claim 1, is characterized in that, before the step of the described APP recommending described expection high to user, described method is further comprising the steps of:

Preset APP recommended amount.

3. the method for claim 1, after the step of the described APP recommending described expection high to user, described method is further comprising the steps of:

Obtain the accuracy that user downloads the high APP of described expection.

4. the method as described in any one of claims 1 to 3, is characterized in that, described proposed algorithm comprises: based on memory collaborative filtering, time-based collaborative filtering, based on puppet scoring collaborative filtering.

5. a system for APP recommendation, it is characterized in that, described system comprises:

User behaviors log acquiring unit, for obtaining and the user behaviors log of recording user, described user behaviors log comprises: user downloads APP record, for browsing APP record;

User behavior matrix generation unit, generates user behavior matrix for the user behaviors log obtained according to described user behaviors log acquiring unit;

Expection computing unit, for the proposed algorithm that basis is preset, the user behavior matrix computations user using described user behavior matrix generation unit to generate downloads the expection of APP;

APP recommendation unit, the APP that the expection for recommending described expection computing unit to calculate to user is high.

6. system as claimed in claim 5, is characterized in that, before described APP recommendation unit is recommended, described system also comprises:

APP quantity presets unit, for default APP recommended amount.

7. system as claimed in claim 5, is characterized in that, after described APP recommendation unit is recommended, described system also comprises:

Downloading accuracy acquiring unit, downloading the accuracy of the high APP of described expection for obtaining user.

8. the system as described in any one of claim 5 ~ 7, is characterized in that, described proposed algorithm comprises: based on memory collaborative filtering, time-based collaborative filtering, based on puppet scoring collaborative filtering.