CN107180078A

CN107180078A - A kind of method for vertical search based on user profile learning

Info

Publication number: CN107180078A
Application number: CN201710263913.6A
Authority: CN
Inventors: 勾智楠; 韩立新
Original assignee: Hohai University HHU
Current assignee: Hohai University HHU
Priority date: 2017-04-21
Filing date: 2017-04-21
Publication date: 2017-09-19

Abstract

The invention discloses a vertical search method based on user interest learning, specifically: according to the content of each attribute filled in by the user for the user's personal interest needs when registering, determine the user's interest preference, and establish an initial user interest model; if the user presses Filter query by attribute category, then add the attribute category query record to the user's personal interest model to complete the model update; calculate the user's interest degree value for all entity resources, and then sort and store according to the interest degree value from high to low; personalized search Then filter the query by keyword or attribute selected by the user to generate an initial query result list; for the entity resources in the initial query result list, combined with the user's interest in these entity resources, sort the entity resources in the list by The interest degree values are sorted from high to low to generate the final list of query results. The present invention actively provides the result consistent with the user's current interest through the learning of the user's interest.

Description

A Vertical Search Method Based on User Interest Learning

技术领域technical field

本发明涉及一种基于用户兴趣学习的垂直搜索方法，特别是涉及在垂直搜索网站中存储用户浏览信息或查询行为的垂直搜索方法，属于垂直搜索技术领域。The invention relates to a vertical search method based on user interest learning, in particular to a vertical search method for storing user browsing information or query behavior in a vertical search website, and belongs to the technical field of vertical search.

背景技术Background technique

当今是一个信息爆炸的时代，信息已经不是稀缺资源，各种信息层出不穷的情况下，用户的注意力反而变得稀缺了。搜索引擎技术的进一步发展，给予用户很大便利。根据中国互联网络信息中心CNNIC最新发布的《第39次中国互联网络发展状况统计报告》数据，截至2016年12月，中国搜索引擎用户规模达6.02亿，使用率为82.4％，搜索引擎深度融合人工智能，垂直专业化成发展趋势。一方面，搜索信息的种类更加丰富；另一方面，搜索引擎针对用户在不同领域的搜索需求，推出更加智能、全面、专业的搜索实体。由此引发了搜索引擎行业出现新的垂直、专业化发展趋势。Today is an era of information explosion. Information is no longer a scarce resource. With all kinds of information emerging one after another, users' attention has become scarce instead. The further development of search engine technology has given users great convenience. According to the "39th Statistical Report on Internet Development in China" released by China Internet Network Information Center CNNIC, as of December 2016, the number of search engine users in China reached 602 million, with a utilization rate of 82.4%. Smart, vertical specialization has become a development trend. On the one hand, the types of search information are more abundant; on the other hand, search engines launch more intelligent, comprehensive and professional search entities to meet users' search needs in different fields. This has triggered a new vertical and professional development trend in the search engine industry.

用户的检索途径主要包括自由词全文检索、关键词检索、分类检索及其他特殊信息的检索。与传统的搜索引擎相比，垂直搜索引擎需要按照内容而不是通过分析网页之间的链接关系进行检索结果排序。而个性化的垂直搜索需要对用户行为进行分析，并建立用户个性化模型，满足个性化需求的排序结果。目前，大多数垂直搜索引擎网站都提供了实体属性筛选的高级搜索，但是筛选的结果排序往往是按照某种条件的升序或降序。The user's search methods mainly include free word full-text search, keyword search, classification search and other special information search. Compared with traditional search engines, vertical search engines need to sort search results according to content rather than by analyzing the link relationship between web pages. And personalized vertical search needs to analyze user behavior, and establish user personalization model, to meet the sorting results of personalized needs. At present, most vertical search engine websites provide advanced search for entity attribute filtering, but the sorting of the filtering results is often in ascending or descending order according to certain conditions.

在已有的个性化搜索方法中，主流搜索方式是基于社会化标注搜索，通过用户收藏、标注和共享等行为而产生的社会性标注数据，直接反映了用户对网页的质量评价和内容理解。利用社会性标注数据对网页排序来改善网页检索性能。Among the existing personalized search methods, the mainstream search method is based on social tagging search, and the social tagging data generated through user collection, tagging, and sharing directly reflect users' quality evaluation and content understanding of web pages. Use social tagging data to rank webpages to improve webpage retrieval performance.

但现阶段垂直搜索引擎提供的信息检索服务仍存在以下不足。首先，垂直搜索引擎只能进行被动查询匹配。只有用户提交了查询词以后，搜索引擎才能做出反应。这隐含着两大问题，一方面是用户很多时候难以明确地总结出自己的需求，另一方面即使查询词可以体现用户当前需求，但是大多数垂直搜索引擎网站并没有考虑到将用户的兴趣提取，并进行个性化搜索，或主动做出信息推荐服务；其次，在提供用户对资源进行社会化标注的垂直搜索引擎中，一般情况下，容易受到垃圾标注的影响：恶意用户可能大量使用广泛出现的标签来不正当的提升其影响力，更是给搜索结果的有效性带来了巨大挑战。However, the information retrieval services provided by vertical search engines still have the following deficiencies at this stage. First, vertical search engines can only do passive query matching. Only after the user submits the query, the search engine can respond. This implies two major problems. On the one hand, it is often difficult for users to clearly summarize their needs. On the other hand, even if the query words can reflect the current needs of users, most vertical search engine websites do not take into account the interests of users. Extract and perform personalized search, or actively make information recommendation services; secondly, in vertical search engines that provide users with social annotation of resources, in general, they are easily affected by spam annotation: malicious users may use a large number of widely used The emergence of tags to improperly enhance its influence has brought great challenges to the validity of search results.

因此，在垂直搜索引擎中需要追求一种更高精度的个性化垂直搜索。Therefore, it is necessary to pursue a personalized vertical search with higher precision in the vertical search engine.

发明内容Contents of the invention

本发明所要解决的技术问题是：提供一种基于用户兴趣学习的垂直搜索方法，通过对用户搜索行为的学习，挖掘用户的兴趣，主动提供给用户当前兴趣一致的结果。The technical problem to be solved by the present invention is: to provide a vertical search method based on user interest learning, through the learning of user search behavior, to mine user interests, and to actively provide users with results consistent with their current interests.

本发明为解决上述技术问题采用以下技术方案：The present invention adopts the following technical solutions for solving the problems of the technologies described above:

一种基于用户兴趣学习的垂直搜索方法，包括如下步骤：A vertical search method based on user interest learning, comprising the following steps:

步骤1，在包含属性筛选高级搜索的垂直搜索引擎网站中，根据初始化时与用户的交互，确定用户的兴趣偏好；Step 1, in the vertical search engine website including attribute filtering advanced search, according to the interaction with the user during initialization, determine the user's interest preference;

步骤2，根据用户的兴趣偏好，建立初始化用户兴趣模型，初始化用户兴趣模型表示为：x_i＝(属性取值：属性取值被用户关注的次数-属性取值最近一次被用户关注的时间)，x_i表示第i个属性；Step 2, according to the user's interest preference, establish an initial user interest model, and the initial user interest model is expressed as: x _i = (attribute value: the number of times the attribute value is followed by the user - the time when the attribute value was last followed by the user) , x _i represents the i-th attribute;

步骤3，若用户按照属性进行查询，则根据用户选择的属性取值，对用户兴趣模型进行实时更新；若用户按照关键字进行查询，则不更新用户兴趣模型；Step 3, if the user queries according to the attribute, the user interest model is updated in real time according to the value of the attribute selected by the user; if the user queries according to the keyword, the user interest model is not updated;

步骤4，若用户发生浏览行为，设定浏览时间作为阈值，若浏览某实体资源的时间超过阈值，则获取该实体资源的属性值，并对用户兴趣模型进行实时更新；Step 4, if the user has a browsing behavior, set the browsing time as the threshold, if the browsing time of a certain entity resource exceeds the threshold, obtain the attribute value of the entity resource, and update the user interest model in real time;

步骤5，利用多属性效用理论，结合用户的偏好因子和遗忘因子，对所有实体资源进行兴趣度值计算并按兴趣度值由高到低进行排序；Step 5, using the multi-attribute utility theory, combined with the user's preference factor and forgetting factor, calculate the interest degree value of all entity resources and sort them according to the interest degree value from high to low;

步骤6，根据用户选择的按关键字或属性进行筛选查询，生成初始的查询结果列表；Step 6, filter and query according to keywords or attributes selected by the user, and generate an initial list of query results;

步骤7，针对初始的查询结果列表中的实体资源，结合用户对这些实体资源的兴趣度值，将初始的查询结果列表中的实体资源按兴趣度值由高到低进行排序，生成最终的查询结果列表。Step 7: For the entity resources in the initial query result list, combined with the user's interest in these entity resources, sort the entity resources in the initial query result list according to the interest value from high to low, and generate the final query list of results.

作为本发明的一种优选方案，步骤1所述具体过程为：在用户首次进入包含属性筛选高级搜索的垂直搜索引擎网站时，针对每个属性与用户进行交互，获取用户关于各属性的兴趣偏好。As a preferred solution of the present invention, the specific process described in step 1 is: when the user first enters the vertical search engine website including attribute screening and advanced search, interact with the user for each attribute, and obtain the user's interest preference about each attribute .

作为本发明的一种优选方案，步骤3所述若用户按照属性进行查询，则根据用户选择的属性取值，对用户兴趣模型进行实时更新具体过程为：若用户选择的属性取值已经在用户兴趣模型中，则将该属性取值被用户关注的次数加1，并更新该属性取值最近一次被用户关注的时间，否则，将该属性取值、该属性取值被用户关注的次数及该属性取值最近一次被用户关注的时间增加至用户兴趣模型中。As a preferred solution of the present invention, in step 3, if the user queries according to the attribute, then according to the value of the attribute selected by the user, the specific process of updating the user interest model in real time is as follows: if the value of the attribute selected by the user has already been selected by the user In the interest model, add 1 to the number of times the value of the attribute is followed by the user, and update the time when the value of the attribute was last followed by the user, otherwise, the value of the attribute, the number of times the value of the attribute was followed by the user and The last time the value of this attribute was followed by the user is added to the user interest model.

作为本发明的一种优选方案，所述步骤5的具体过程为：As a preferred solution of the present invention, the specific process of the step 5 is:

步骤51，利用AF-IDF算法计算偏好因子，计算公式为：Step 51, using the AF-IDF algorithm to calculate the preference factor, the calculation formula is:

k_i＝AF(X/T)*log_a(N/DF(k_i))k _i =AF(X/T)*log _a (N/DF(k _i ))

其中，k_i表示用户对第j个实体资源第i个属性的偏好因子，AF(X/T)为用户在周期T内对第j个实体资源第i个属性关注的平均次数，X为关注该属性的总次数，N为垂直搜索引擎网站中实体资源的总数，DF(k_i)为与第j个实体资源第i个属性取值相同的实体资源的总数，a为超参数；Among them, _ki represents the user's preference factor for the i-th attribute of the j-th entity resource, AF(X/T) is the average number of times the user pays attention to the i-th attribute of the j-th entity resource in a period T, and X is the attention The total number of times of this attribute, N is the total number of entity resources in the vertical search engine website, DF (k _i ) is the total number of entity resources identical with the i attribute value of the jth entity resource, and a is a hyperparameter;

步骤52，在偏好因子中引入遗忘因子f_i，则引入遗忘因子后的偏好因子k_i′为：Step 52, introduce the forgetting factor f _i into the preference factor, then the preference factor k _i ' after introducing the forgetting factor is:

k′_i＝k_i*f_i k′ _i =k _i *f _i

步骤53，设定垂直搜索引擎网站包含的属性为n个，即i＝1,2,…,n，则用户对第j个实体资源的兴趣度值用乘法效用函数表示：Step 53, set the vertical search engine website to include n attributes, i.e. i=1, 2,..., n, then the user's interest degree value to the jth entity resource is represented by a multiplicative utility function:

U_j＝k′₁+k′₂+…+k′_n+k′₁k′₂+k′₁k′₃+…+k′₁k′_n+k′₂k′₃+k′₂k′₄+…+k′₂k′_n+k′₃k′₄+k′₃k′₅+…+k′₃k′_n+……+k′_n-1k′_n U _j =k′ ₁ +k′ ₂ +…+k′ _n +k′ ₁ k′ ₂ +k′ ₁ k′ ₃ +…+k′ ₁ k′ _n +k′ ₂ k′ ₃ +k′ ₂ k′ ₄ +…+k′ ₂ k′ _n +k′ ₃ k′ ₄ +k′ ₃ k′ ₅ +…+k′ ₃ k′ _n +…+k′ _n-1 k′ _n

其中，U_j表示用户对第j个实体资源的兴趣度值；Among them, U _j represents the user's interest degree value to the jth entity resource;

步骤54，将所有实体资源的兴趣度值由高到低进行排序，并输出排序结果。Step 54, sort the interest degree values of all entity resources from high to low, and output the sorting result.

作为本发明的一种优选方案，所述遗忘因子f_i计算公式为：As a preferred solution of the present invention, the formula for calculating the _forgetting factor fi is:

f_i＝exp(-log_b(t)/f)f _i =exp(-log _b (t)/f)

其中，t为第j个实体资源第i个属性的取值最后一次被关注的时间与当前时间之差，f为半衰期，b为超参数。Among them, t is the difference between the last time the value of the i-th attribute of the j-th entity resource was paid attention to and the current time, f is the half-life, and b is the hyperparameter.

本发明采用以上技术方案与现有技术相比，具有以下技术效果：Compared with the prior art, the present invention adopts the above technical scheme and has the following technical effects:

1、本发明算法复杂度的数量级较小，若属性个数为n，每个实体资源只在n²级就能计算出该实体资源的兴趣度值。1. The order of magnitude of the algorithm complexity of the present invention is small. If the number of attributes is n, each entity resource can calculate the interest degree value of the entity resource only at n+ ² levels.

2、本发明算法的大部分计算在线下，小部分实时计算，即使该垂直搜索系统的实体资源和注册用户较多，一般企业级服务器均可完全满足算法的运行实现。2. Most of the algorithm of the present invention is calculated offline, and a small part is calculated in real time. Even if the vertical search system has many physical resources and registered users, general enterprise-level servers can fully satisfy the operation and realization of the algorithm.

3、本发明算法通用性强，能在各种垂直搜索引擎上实现，并有很好的扩充性。3. The algorithm of the present invention has strong versatility, can be implemented on various vertical search engines, and has good scalability.

附图说明Description of drawings

图1是本发明一种基于用户兴趣学习的垂直搜索方法的流程示意图。FIG. 1 is a schematic flowchart of a vertical search method based on user interest learning in the present invention.

具体实施方式detailed description

下面详细描述本发明的实施方式，所述实施方式的示例在附图中示出。下面通过参考附图描述的实施方式是示例性的，仅用于解释本发明，而不能解释为对本发明的限制。Embodiments of the invention are described in detail below, examples of which are illustrated in the accompanying drawings. The embodiments described below by referring to the figures are exemplary only for explaining the present invention and should not be construed as limiting the present invention.

如图1所示，为一种基于用户兴趣学习的垂直搜索方法的流程示意图，具体步骤如下：As shown in Figure 1, it is a schematic flow chart of a vertical search method based on user interest learning, and the specific steps are as follows:

步骤1：在包含属性筛选高级搜索的垂直搜索引擎网站中，根据初始化时与用户交互来确定用户的兴趣偏好，以避免冷启动的问题出现。Step 1: In the vertical search engine website that includes attribute filtering and advanced search, determine the user's interest preference according to the interaction with the user during initialization, so as to avoid the problem of cold start.

步骤2：建立初始化用户兴趣模型，并实时通过用户的属性筛选查询记录和浏览记录对模型进行调整。记录每一次用户使用属性筛选查询和浏览行为，将属性查询信息和浏览的实体资源属性信息增加至用户个性化模型中。Step 2: Establish and initialize the user interest model, and adjust the model in real time by filtering query records and browsing records based on user attributes. Record each user's query and browsing behavior using attribute filtering, and add attribute query information and browsed entity resource attribute information to the user's personalized model.

步骤3：利用多属性效用理论，结合用户的偏好因子，在后台对所有实体资源进行个性化兴趣度值计算，生成用户个性化兴趣的实体资源排序列表。具体为：Step 3: Using the multi-attribute utility theory, combined with the user's preference factor, calculate the personalized interest degree value of all entity resources in the background, and generate a sorted list of entity resources of user's personalized interest. Specifically:

步骤301：设垂直搜索引擎中的给定的n个属性，用户的个性化偏好效用方程可以用乘法效用函数来描述：U_j＝k₁+k₂+……+k_n+k₁k₂+k₂k₃+k₁k₃+……+k_n-1k_n。其中U_j表示第j个实体在当前用户的兴趣度值，k_i表示用户对实体的第i个属性值的偏好因子。公式表述为每个实体属性值偏好因子独立计算后的结果相加，并且每个实体属性值偏好因子要与其它每个实体属性值偏好因子进行乘积后再相加。Step 301: Given n attributes in the vertical search engine, the user's personalized preference utility equation can be described by a multiplicative utility function: U _j =k ₁ +k ₂ +...+k _n +k ₁ k ₂ +k ₂ k ₃ +k ₁ k ₃ +...+k _n-1 k _n . Where U _j represents the interest degree value of the j-th entity in the current user, and _ki represents the user's preference factor for the i-th attribute value of the entity. The formula is expressed as the addition of the independent calculation results of each entity attribute value preference factor, and each entity attribute value preference factor must be multiplied by each other entity attribute value preference factor and then added.

步骤302：计算偏好因子k_i基于用户对实体的兴趣度不同，本发明采用平均Average Frequency/Inverse Document Frequency(AF-IDF)算法来表征用户对实体的第i个属性值喜好程度，k_i＝AF(X/T)*log_a(N/DF(k_i))，其中AF(X/T)为用户在T周期内对实体的第i个属性值的关注的平均次数(查询或浏览次数)，X为关注的该属性值总次数，n为该垂直搜索系统中实体的总数，DF(k_i)为属性取值为k_i实体个数，依次计算每个属性权重后，得到用户基于属性分类的兴趣偏好向量。Step 302: Calculating the preference factor _ki Based on the user's different interest in the entity, the present invention uses the average Average Frequency/Inverse Document Frequency (AF-IDF) algorithm to characterize the user's preference for the i-th attribute value of the entity, _ki = AF(X/T)*log _a (N/DF(k _i )), where AF(X/T) is the average number of times the user pays attention to the i-th attribute value of the entity in the T period (query or browsing times ), X is the total number of attribute values concerned, n is the total number of entities in the vertical search system, DF(k _i ) is the number of entities whose attribute value is k _i , after calculating the weight of each attribute in turn, the user based on An interest preference vector for attribute classification.

步骤303：用户的兴趣往往是发生变化，将偏好因子中引入兴趣遗忘因子f，即：k′_i＝k_i*f_i，f_i＝exp(-log_b(t)/f)，其中t为属性值最后一次被关注到当前的时间差，f为半衰期。Step 303: The user's interest often changes, and the interest forgetting factor f is introduced into the preference factor, namely: k′ _i =k _i *f _i , f _i =exp(-log _b (t)/f), where t is the time difference from the last time the attribute value was paid attention to, and f is the half-life.

步骤304：对垂直搜索系统中的每个实体依次计算个性化兴趣度值，直至最后一个实体计算完毕。Step 304: Calculate the personalized interest degree value for each entity in the vertical search system in turn until the calculation of the last entity is completed.

步骤305：根据个性化兴趣度值进行由高到低排序，并输出排序结果，存入用户个人兴趣模型下。Step 305: Sorting from high to low according to the personalized interest value, outputting the sorting result, and storing it in the user's personal interest model.

步骤4：根据用户搜索条件，按照关键字或属性筛选查询，生成初始的查询结果列表。Step 4: According to the user's search conditions, filter the query according to keywords or attributes, and generate an initial list of query results.

步骤5：针对已生成的初始查询结果列表中的实体资源，结合用户个性化兴趣的实体资源排序列表，将初始查询结果列表中的实体资源按照兴趣度值从高到低重新排序。Step 5: For the entity resources in the generated initial query result list, combined with the entity resource sorting list of the user's personalized interest, reorder the entity resources in the initial query result list according to the interest degree value from high to low.

以国内某大型汽车垂直网站的搜索为例(但不仅限于此例)，该网站除了有一个搜索框提供用户按关键字搜索外，还提供一个按分类的精准选车模块，包括按价格、按车型、按品牌、按排量4个属性。Take the search of a large domestic automobile vertical website as an example (but not limited to this example). In addition to a search box for users to search by keyword, the website also provides a precise car selection module by category, including by price and by price. Model, by brand, by displacement 4 attributes.

S1、若要实现个性化搜索，需要学习用户兴趣。首先需要用户注册信息，建立个性化模型。用户注册时填写针对用户个人兴趣需求的4个基本属性的内容。根据初始化时与用户交互来确定用户的兴趣偏好，以避免冷启动的问题出现。用户的初始化个性化兴趣模型由4个基本属性的向量组成。S1. To realize personalized search, it is necessary to learn user interests. First, user registration information is required to establish a personalized model. When the user registers, fill in the content of the 4 basic attributes for the user's personal interest needs. According to the interaction with the user during initialization, the user's interest preference is determined to avoid the problem of cold start. The user's initial personalized interest model consists of vectors of 4 basic attributes.

S2、假设用户1的初始化个性化兴趣模型为：S2. Assume that the initial personalized interest model of user 1 is:

价格＝(8-10万:1-t)，车型＝(SUV:1-t)，品牌＝(长城:1-t)排量＝(1.1-1.6L:1-t)。Price = (80,000-100,000: 1-t), model = (SUV: 1-t), brand = (Great Wall: 1-t) displacement = (1.1-1.6L: 1-t).

注：价格属性向量的取值为“8-10万”，“1”为其值，表示被用户关注一次，“-”为连接符，“t”为最近一次被用户关注的时间。Note: The value of the price attribute vector is "80,000-100,000", "1" is its value, which means it has been followed by a user once, "-" is a connector, and "t" is the time when it was last followed by a user.

S3、用户的查询方式有两种，若按照属性类别筛选，则表明用户对该属性类别的关注，体现用户的偏好，需要进行更新用户个人兴趣模型。若不是按照属性类别筛选，也就是关键字查询，则不记录用户的偏好，也不更新用户个人兴趣模型。S3. There are two query methods for the user. If the query is based on the attribute category, it indicates that the user pays attention to the attribute category, reflects the user's preference, and needs to update the user's personal interest model. If it is not filtered by attribute category, that is, keyword query, the user's preference will not be recorded, and the user's personal interest model will not be updated.

更新过程是记录每一次用户的搜索或浏览行为，存储至个性化信息记录中，以完成个性化信息的更新。个人的兴趣不是固定的，而是动态变化的。随着用户个人的自身和外在的影响，用户的偏好会发生转移。以情景为例，当一个用户初始化时可能是该用户刚刚参加工作，对车的要求是代步即可，可能对价格的要求不高，但随着个人成长和生活质量的提高，需要提升对车的要求，这时用户的个性化兴趣就发生转变，那么用户的搜索和浏览内容也会发生变化。The update process is to record each user's search or browsing behavior and store it in the personalized information record to complete the update of personalized information. Individual interests are not fixed but dynamic. User preferences shift with the user's own personal and external influences. Taking the scenario as an example, when a user is initialized, it may be that the user has just joined the work, and the requirement for the car is only for transportation. The price requirement may not be high, but with the improvement of personal growth and quality of life, it is necessary to improve the requirement for the car. At this time, the user's personalized interests will change, and the user's search and browsing content will also change.

S4、将所选的属性类别增加至用户个性化模型中，对已有的类别的情况，其值增加1，并更新最后关注的时间。S4. Add the selected attribute category to the user personalization model. In the case of an existing category, its value is increased by 1, and the last attention time is updated.

假设用户1，发起属性类别的筛选查询为：Assume that user 1 initiates a filter query for attribute categories as follows:

价格：8-10万；车型：两厢轿车；品牌：福特；排量：1.1-1.6L。Price: RMB 80,000-100,000; Model: Hatchback; Brand: Ford; Displacement: 1.1-1.6L.

则对应的个性化模型进行更新为：价格＝(8-10万:2-t)，车型＝(SUV:1-t，两厢轿车:1-t)，品牌＝(长城:1-t，福特:1-t)，排量＝(1.1-1.6L:2-t)。Then the corresponding personalized model is updated as: price=(8-100,000: 2-t), model=(SUV: 1-t, hatchback: 1-t), brand=(Great Wall: 1-t, Ford : 1-t), displacement = (1.1-1.6L: 2-t).

S5、利用多属性效用理论，结合用户的偏好因子，在后台对所有实体资源进行个性化兴趣度值计算，生成用户个性化兴趣的实体资源排序列表。具体为：S5. Using the multi-attribute utility theory and combining the user's preference factors, calculate the personalized interest degree value of all physical resources in the background, and generate a sorted list of physical resources of the user's personalized interest. Specifically:

设垂直搜索引擎中的给定的n个属性，用户的个性化偏好效用方程可以用乘法效用函数来描述：U_j＝k₁+k₂+……+k_n+k₁k₂+k₂k₃+k₁k₃+……+k_n-1k_n。其中U_j表示第j个实体在当前用户的兴趣度值，k_i表示用户对实体的第i个属性值的偏好因子。公式表述为每个实体属性值偏好因子独立计算后的结果相加，并且每个实体属性值偏好因子要与其它每个实体属性值偏好因子进行乘积后再相加。Given n attributes in the vertical search engine, the user’s personalized preference utility equation can be described by multiplicative utility function: U _j =k ₁ +k ₂ +...+k _n +k ₁ k ₂ +k ₂ k ₃ +k ₁ k ₃ +...+k _n-1 k _n . Where U _j represents the interest degree value of the j-th entity in the current user, and _ki represents the user's preference factor for the i-th attribute value of the entity. The formula is expressed as the addition of the independent calculation results of each entity attribute value preference factor, and each entity attribute value preference factor must be multiplied by each other entity attribute value preference factor and then added.

计算偏好因子k_i基于用户对实体的兴趣度不同，本发明采用Average Frequency/Inverse Document Frequency(AF-IDF)算法来表征用户对实体的第i个属性值喜好程度，k_i＝AF(X/T)*log_a(N/DF(k_i))，其中AF(X/T)为用户在T周期内对实体的第i个属性值的关注的平均次数(查询或浏览次数)，X为关注的该属性值总次数，n为该垂直搜索系统中实体的总数，DF(k_i)为属性取值为k_i实体个数，依次计算每个属性权重后，得到用户基于属性分类的兴趣偏好向量。Calculating the preference factor _ki is based on the user's interest in the entity. The present invention uses the Average Frequency/Inverse Document Frequency (AF-IDF) algorithm to characterize the user's preference for the i-th attribute value of the entity, _ki =AF(X/ T)*log _a (N/DF(k _i )), where AF(X/T) is the average number of times the user pays attention to the i-th attribute value of the entity within the T period (queries or browsing times), and X is The total number of attribute values concerned, n is the total number of entities in the vertical search system, DF(k _i ) is the number of entities whose attribute value is k _i , after calculating the weight of each attribute in turn, the user's interest based on attribute classification is obtained preference vector.

用户的兴趣往往是发生变化，将偏好因子中引入兴趣遗忘因子f_i，即：k′_i＝k_i*f_i，f_i＝exp(-log_b(t)/f)，其中t为属性值最后一次被关注的时间到当前的时间差，f为半衰期。The user's interest often changes, and the interest forgetting factor f _i is introduced into the preference factor, namely: k′ _i =k _i *f _i , f _i =exp(-log _b (t)/f), where t is an attribute The value is the time difference from the last time the value was concerned to the current time, and f is the half-life.

对垂直搜索系统中的每个实体依次计算个性化兴趣度值，直至最后一个实体计算完毕。Calculate the personalized interest degree value for each entity in the vertical search system in turn until the calculation of the last entity is completed.

根据个性化兴趣度值进行由高到低排序，并输出排序结果，存入用户个人兴趣模型下。结果排序是根据算法本身，计算出每个实体资源的得分，并进行排序。以情景为例，在后台为每个用户计算所有实体资源的得分。当用户打开网站主页并没有发起查询时，可以将实体按照得分由大到小排列显示在推荐栏。如果用户发起按关键字或属性筛选查询时，先得到候选实体资源，然后按照实体资源的得分重新排序，显示到搜索结果栏。Sorting from high to low is performed according to the personalized interest value, and the sorting results are output and stored in the user's personal interest model. The result sorting is to calculate the score of each entity resource according to the algorithm itself, and sort them. Taking the scenario as an example, the scores of all entity resources are calculated for each user in the background. When the user opens the homepage of the website and does not initiate a query, the entities can be displayed in the recommendation column in descending order of scores. If the user initiates a filter query by keyword or attribute, the candidate entity resources are obtained first, and then reordered according to the scores of the entity resources and displayed in the search result column.

为了更好的描述算法，作出如下简单化假设：In order to better describe the algorithm, the following simplified assumptions are made:

用户1的兴趣模型恰好是S4步骤更新，4个属性的向量为：The interest model of user 1 happens to be updated in step S4, and the vectors of the 4 attributes are:

价格＝(8-10万:2-t)，Price = (80,000-100,000: 2-t),

车型＝(SUV:1-t，两厢轿车:1-t)，Model = (SUV: 1-t, Hatchback: 1-t),

品牌＝(长城:1-t，福特:1-t)，Brand = (Great Wall: 1-t, Ford: 1-t),

排量＝(1.1-1.6L:2-t)。Displacement = (1.1-1.6L:2-t).

某实体U₁恰好为S4步骤关注的同类车型，其属性为：价格：8-10万；车型：两厢轿车；品牌：福特；排量：1.1-1.6L。A certain entity U ₁ happens to be the same type of car concerned by step S4, and its attributes are: price: 80,000-100,000; model: hatchback; brand: Ford; displacement: 1.1-1.6L.

该垂直网站汽车实体总数n＝1000。The total number of car entities in the vertical website is n=1000.

价格在8-10万的数量为500，The price is 80,000-100,000 and the quantity is 500,

车型为两厢轿车的数量为250，The number of models is hatchback sedan is 250,

品牌为福特的数量为25，The brand is Ford and the quantity is 25,

排量为1.1-1.6L的数量为250，The quantity with a displacement of 1.1-1.6L is 250,

周期T暂设为1。The period T is temporarily set to 1.

由于关注时间t与查询为同一天，无时间差，则遗忘因子为1。Since the attention time t is the same day as the query and there is no time difference, the forgetting factor is 1.

计算U₁＝k₁+k₂+k₃+k₄+k₁k₂+k₁k₃+k₁k₄+k₂k₃+k₂k₄+k₃k₄的值，表明用户对实体U₁的兴趣度。本例中取a的值为2。Calculate the value of U ₁ =k ₁ +k ₂ +k ₃ +k ₄ +k ₁ k ₂ +k ₁ k ₃ +k ₁ k ₄ +k ₂ k ₃ +k ₂ k ₄ +k ₃ k ₄ , indicating that the user Degree of interest in entity U ₁ . In this example, the value of a is 2.

k₁＝2*log₂(1000/500)＝2k ₁ =2*log ₂ (1000/500)=2

k₂＝2*log₂(1000/250)＝4k ₂ =2*log ₂ (1000/250)=4

k₃＝2*log₂(1000/25)＝10.64k ₃ =2*log ₂ (1000/25)=10.64

k₄＝2*log₂(1000/250)＝4k ₄ =2*log ₂ (1000/250)=4

U₁＝2+4+10.64+4+2*4+2*10.64+2*4+4*10.64+4*4+10.64*4＝159.04U ₁ ＝2+4+10.64+4+2*4+2*10.64+2*4+4*10.64+4*4+10.64*4＝159.04

实体U₁在用户1个性化模型中的权值为159.04。The weight _of entity U1 in user1's personalized model is 159.04.

以上通过一个简单的实施例说明一个实体在用户个性化模型中的权值，所有数据的设定均为人为假设，实施时需要针对具体的垂直实体网站中的数据为准。周期T、半衰期f等参数需要通过实验获得更为有效的设定。The weight of an entity in the user personalization model is described above through a simple example. All data settings are artificial assumptions, and the implementation needs to be based on the data in the specific vertical entity website. Parameters such as period T and half-life f need to be set more effectively through experiments.

S6、如果用户发生浏览行为，这里可以设定浏览时间作为阈值，浏览某个资源超过阈值时，则表明用户对此实体资源的属性类别关注，体现兴趣偏好，则按照S4步骤进行更新用户个人兴趣模型。S6. If the user has a browsing behavior, the browsing time can be set as the threshold here. When browsing a certain resource exceeds the threshold, it indicates that the user pays attention to the attribute category of this entity resource, reflecting the interest preference, and updates the user's personal interest according to the step S4 Model.

以上实施例仅为说明本发明的技术思想，不能以此限定本发明的保护范围，凡是按照本发明提出的技术思想，在技术方案基础上所做的任何改动，均落入本发明保护范围之内。The above embodiments are only to illustrate the technical ideas of the present invention, and can not limit the protection scope of the present invention with this. All technical ideas proposed in accordance with the present invention, any changes made on the basis of technical solutions, all fall within the protection scope of the present invention. Inside.

Claims

1. a vertical search method based on user interest learning, is characterized in that, comprises the steps:

Step 1, in the vertical search engine website including attribute filtering advanced search, according to the interaction with the user during initialization, determine the user's interest preference;

Step 2, according to the user's interest preference, establish an initial user interest model, and the initial user interest model is expressed as: x _i = (attribute value: the number of times the attribute value is followed by the user - the time when the attribute value was last followed by the user) , x _i represents the i-th attribute;

Step 3, if the user queries according to the attribute, the user interest model is updated in real time according to the value of the attribute selected by the user; if the user queries according to the keyword, the user interest model is not updated;

Step 4, if the user has a browsing behavior, set the browsing time as the threshold, if the browsing time of a certain entity resource exceeds the threshold, obtain the attribute value of the entity resource, and update the user interest model in real time;

Step 5, using the multi-attribute utility theory, combined with the user's preference factor and forgetting factor, calculate the interest degree value of all entity resources and sort them according to the interest degree value from high to low;

Step 6, filter and query according to keywords or attributes selected by the user, and generate an initial list of query results;

Step 7: For the entity resources in the initial query result list, combined with the user's interest in these entity resources, sort the entity resources in the initial query result list according to the interest value from high to low, and generate the final query list of results.

2. according to the described vertical search method based on user interest learning of claim 1, it is characterized in that, the specific process described in step 1 is: when the user enters the vertical search engine website that comprises attribute screening advanced search for the first time, for each attribute and The user interacts to obtain the user's interest preferences for each attribute.

3. The vertical search method based on user interest learning according to claim 1, characterized in that, in step 3, if the user queries according to the attribute, the user interest model is updated in real time according to the value of the attribute selected by the user. It is: if the value of the attribute selected by the user is already in the user interest model, add 1 to the number of times the value of the attribute is followed by the user, and update the time when the value of the attribute was last followed by the user; otherwise, take the value of the attribute value, the number of times the value of the attribute is followed by the user, and the time when the value of the attribute was last followed by the user are added to the user interest model.

4. according to the described vertical search method based on user interest learning of claim 1, it is characterized in that, the specific process of described step 5 is:

Step 51, using the AF-IDF algorithm to calculate the preference factor, the calculation formula is:

k _i =AF(X/T)*log _a (N/DF(k _i ))

Among them, _ki represents the user's preference factor for the i-th attribute of the j-th entity resource, AF(X/T) is the average number of times the user pays attention to the i-th attribute of the j-th entity resource in a period T, and X is the attention The total number of times of this attribute, N is the total number of entity resources in the vertical search engine website, DF (k _i ) is the total number of entity resources identical with the i attribute value of the jth entity resource, and a is a hyperparameter;

Step 52, introduce the forgetting factor f _i into the preference factor, then the preference factor k _i ' after introducing the forgetting factor is:

k′ _i =k _i *f _i

Step 53, set the vertical search engine website to include n attributes, i.e. i=1, 2,..., n, then the user's interest degree value to the jth entity resource is represented by a multiplicative utility function:

U _j =k′ ₁ +k′ ₂ +…+k′ _n +k′ ₁ k′ ₂ +k′ ₁ k′ ₃ +…+k′ ₁ k′ _n +k′ ₂ k′ ₃ +k′ ₂ k′ ₄ +…+k′ ₂ k′ _n +

k′ ₃ k′ ₄ +k′ ₃ k′ ₅ +…+k′ ₃ k′ _n +…+k′ _n-1 k′ _n

Among them, U _j represents the user's interest degree value to the jth entity resource;

Step 54, sort the interest degree values of all entity resources from high to low, and output the sorting result.

5. according to the described vertical search method based on user interest learning of claim 4, it is characterized in that, described forgetting factor _fi calculation formula is:

f _i =exp(-log _b (t)/f)

Among them, t is the difference between the last time the value of the i-th attribute of the j-th entity resource was paid attention to and the current time, f is the half-life, and b is the hyperparameter.