CN103400286B - One proposed system and method for tagging feature articles based on user behavior - Google Patents

One proposed system and method for tagging feature articles based on user behavior Download PDF

Info

Publication number
CN103400286B
CN103400286B CN201310333575.0A CN201310333575A CN103400286B CN 103400286 B CN103400286 B CN 103400286B CN 201310333575 A CN201310333575 A CN 201310333575A CN 103400286 B CN103400286 B CN 103400286B
Authority
CN
China
Prior art keywords
article
correlation
module
behavior
data
Prior art date
Application number
CN201310333575.0A
Other languages
Chinese (zh)
Other versions
CN103400286A (en
Inventor
卢青峰
王冬杰
朱勇勇
Original Assignee
世纪禾光科技发展(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 世纪禾光科技发展(北京)有限公司 filed Critical 世纪禾光科技发展(北京)有限公司
Priority to CN201310333575.0A priority Critical patent/CN103400286B/en
Publication of CN103400286A publication Critical patent/CN103400286A/en
Application granted granted Critical
Publication of CN103400286B publication Critical patent/CN103400286B/en

Links

Abstract

本发明涉及一种基于用户行为进行物品特征标注的推荐系统及方法,通过使用用户搜索行为以及搜索行为之后加于物品之上的行为来构建隐式数据序列并提取物品特征,然后以此特征与物品的相关度计算结果作为基础数据,将此数据与用户新的行为进行比较并把相关产品排序后推荐给用户。 The present invention relates to a system and method for item recommendation based on annotation characteristics user behavior, in addition to that article behavior implicit data sequence constructed by using the user search behavior and search behavior after extraction items and features, and characteristics of this Relevance items as the basis for the calculation result data, this data with new users to compare the behavior and the related products recommended to the user after ordering.

Description

一种基于用户行为进行物品特征标注的推荐系统及方法 One proposed system and method for tagging feature articles based on user behavior

技术领域 FIELD

[0001] 本发明涉及移动互联网技术领域,具体涉及一种网上基于用户行为的物品推荐系统及其实施方法。 [0001] The present invention relates to mobile Internet technology, particularly relates to an online article recommendation system based embodiment and user behavior.

背景技术 Background technique

[0002] 推荐技术的是从大量的信息中,快速检索到用户可能感兴趣的物品,并展示给用户的一种技术,减小用户在日益增长的海量信息中进行检索的难度。 [0002] from the recommended technique is a lot of information, quickly retrieve the items the user may be interested in, and presented to the user of a technology, reducing the difficulty of the user to search in the growing mass of information.

[0003] 当前的推荐系统,在对用户产生的行为数据处理上有两种选择。 [0003] The current recommendation system, there are two options on the user behavior data generated by the processing. 一种是选用用户的显示反馈信息,比如用户对物品的评分、评价等信息来作为推荐系统的基础数据。 One is to display a user selection feedback information, such as user ratings for items, the evaluation information such as the basic data recommendation system. 在某些情况下,用户并不会提供评论等额外显式反馈信息,导致能用于分析的数据非常稀疏,使得推荐系统达不到预期的理想效果,不利于用作基础数据进行分析。 In some cases, the user does not provide explicit feedback reviews, and additional information, can lead to data for analysis is very sparse, so that the recommendation system is expected to reach the desired effect, it is not conducive to be used as basic data for analysis. 另外一种是采用对隐式数据进行分析的方式,在对隐式数据重新组织之后获得一些用户的行为规律,比如点击、购买等行为,而不需要用户提供额外的反馈信息。 Another way is to use implicit data analysis, users obtain some rules of behavior after the re-organization of the implicit data, such as clicks, purchase behavior, etc., without requiring the user to provide additional feedback. 例如专利号为201210499328.3的中国专利申请中公开了一种终端自动推荐相似商品的方法及装置,通过对获取的商品图像进行信息提取,在数据库中搜索出与感兴趣商品相似度高的商品供用户选择,该专利申请不能通过以获取关键词的方式来搜索商品数据库,同时信息提取方式简单,因此从数据库中获取的商品不能完全符合用户需求。 For example Chinese patent application No. 201210499328.3 disclosed is a method and apparatus for automatically recommend similar end product, merchandise information extracted by the images acquired, the search for high product of interest similar to the database for the user commodity Alternatively, the patent application can not be a way to get keywords to find the product database, and simple information retrieval, thus obtaining merchandise from the database can not be fully in line with user needs.

发明内容 SUMMARY

[0004] 本发明的目的在于提供一种处理网上商品销售中如何推荐物品的基础数据的方法:利用计算机、服务器以及相应的软件对基础数据进行切分、数据关联、生成物品的特征向量与特征的物品向量,最后从商品数据库中取出相似度最高的几种商品排序后推荐给用户;同时,本发明还提供了一套匹配该方法的完整的系统。 [0004] The object of the present invention is to provide an online sale of goods based on how the data items are recommended methods: using a computer, server software, and the corresponding data base segmentation, data associated with the feature vector generation article characterized after the article vector, the final product remove the highest similarity ordered from several recommended merchandise database to the user; the same time, the present invention also provides a complete set of system to match the process.

[0005] 本发明的第一方面提供一种处理推荐的基础数据的方法: [0005] The basic data for processing of the preferred method of the first aspect of the present invention:

[0006] 1、通过网络从用户客户端收集用户行为数据并传送到终端服务器; [0006] 1, user behavior data is collected by the network from the user client and the server to the terminal;

[0007] 2、对所述用户行为数据进行排序,并统计时间间隔,该时间间隔用于对该用户行为序列的切分; [0007] 2, the user behavior data is sorted, and statistical time interval, the time interval of the user for slicing action sequences;

[0008] 3、将位于同一序列中的物品与用户行为进行关联; [0008] 3, the article will be located in user behavior is associated with the same sequence;

[0009] 4、对所述用户行为进行特征提取。 [0009] 4, the user behavior feature extraction.

[0010] 5、对所述物品与所述用户行为生成其特征向量空间,并计算各特征的得分; [0010] 5, characterized in generating a vector space with the article of the user behavior, and calculates the score of each characteristic;

[0011] 6、对物品的特征向量空间转置,得到特征的物品向量空间; [0011] 6, the feature vector space transpose article, an article obtained feature vector space;

[0012] 7、根据特征的物品向量空间,计算得到特征与物品的相关度; [0012] 7. The article feature vector space, the calculated degree of correlation with the article characteristics;

[0013] 8、根据物品的特征向量空间计算得到物品之间的相关度; [0013] 8. The article feature vector space is calculated degree of correlation between the article;

[0014] 9、根据相关度大小推荐用户感兴趣的物品。 [0014] 9, according to the size of the correlation interested users recommended items.

[0015] 本发明的第二方面提供一种基于用户行为进行物品特征标注的推荐系统,具有三大数据处理服务器与三大存储模块,数据处理服务器具有:预处理服务器,计算服务器,推荐服务器。 The second aspect of the [0015] present invention provides a recommendation system for labeling an article characteristics based on user behavior, having three and the three data processing server storing module, a data processing server comprising: a preprocessing server, compute server, the recommendation server. 存储模块具有日志存储模块、中间计算存储模块、结果存储模块。 Storage module having a log storage module, the storage module calculates the intermediate result storage module.

[0016] 优选的是,用户使用某搜索条件进行搜索之后,参与了某些物品的操作(例如点击、购买等操作),如果某搜索条件与某物品同时出现的概率较大,此搜索条件将被作为此物品的用户隐式属性。 [0016] Preferably, after the user using a search condition for searching articles involved in the operation of certain (e.g. clicks, for later operations, etc.), a search condition and if the probability of the simultaneous occurrence of a large article, this search condition users article is used as an implicit attribute.

[0017] 在上述任一方案中优选的是,所述隐式属性经过服务器的计算以及存储模块的存储来得到推荐物品列表。 [0017] In any of the above preferred embodiment is that the implicit attribute has been calculated and stored in the storage module of the server to obtain a list of recommended items.

[0018] 在上述任一方案中优选的是,所述预处理服务器具有4个主要模块:用户行为数据收集模块,行为数据组织模块,物品关联行为提取模块,物品特征值计算模块。 [0018] In any of the above preferred embodiment in that the pretreatment server has four main modules: the user behavior data collection module, a data organization module behavior, conduct extraction module associated with the article, the article characteristic value calculation module.

[0019] 在上述任一方案中优选的是,用户行为数据收集模块,用于从所述业务日志存储模块中提取出用户的行为数据,并按行为的产生时间顺序生成行为序列,每条行为记录需要标示其产生者。 [0019] In any of the above preferred embodiment, the user behavior data collection module for extracting from the service module, the behavior log storing user data, generate a time sequence of sequence generation behavior press behavior, each behavior records need to be labeled with the producer.

[0020] 在上述任一方案中优选的是,所述用户行为数据收集模块将每条序列与其生产者传送到所述中间计算存储模块,具体的传送到该模块下的用户行为存储模块。 [0020] In any of the above preferred embodiment is that the user behavior data collection module to transmit each sequence for computing their producers to the intermediate memory module is transmitted to the user's behavior in the storage module of the module.

[0021] 在上述任一方案中优选的是,所述中间计算存储模块还具有物品特征向量存储模块、物品特征数据存储模块。 [0021] In any of the above preferred embodiment, said intermediate memory module further having a calculated feature vector storage module article, wherein the article data storage module.

[0022] 在上述任一方案中优选的是,所述用户行为存储模块用于存储必要的用户行为数据。 [0022] In any of the above preferred embodiment is that the user behavior storing means for storing necessary user behavior data.

[0023] 在上述任一方案中优选的是,所述用户行为存储模块将该模块得到的用户行为数据传送到所述用户行为数据组织模块。 [0023] In any of the above preferred embodiment, the behavior of the user transmitting storage module the module user behavior data to obtain the user behavior data organization module.

[0024] 在上述任一方案中优选的是,所述行为数据组织模块具有的功能包括:统计行为切分时间间隔,根据统计得到的时间间隔将用户行为序列进行切分成行为片段。 [0024] In any of the above preferred embodiment, it said behavioral data organization module having features comprising: slicing the statistical behavior of the time interval, based on time intervals the counted sequence is segmented into user behavior Behavior fragment.

[0025] 在上述任一方案中优选的是,所述行为数据组织模块将得到的片段通过所述物品关联行为提取模块对物品与用户行为数据进行关联,并将用户行为数据按规则切分之后做为物品的特征。 [0025] In any of the above preferred embodiment, the module with the user activity data items associated with the behavior data module tissue fragments obtained by extracting the item associated behaviors, behavior data according to the rules and user segmentation after as a feature article.

[0026] 在上述任一方案中优选的是,所述物品关联行为提取模块将物品特征传送到所述物品特征向量存储模块进行存储。 [0026] In any of the above preferred embodiment, the behavior associated with the article to conveyed articles feature extraction module to the article feature vector storage module for storage.

[0027] 在上述任一方案中优选的是,所述物品特征值计算模块从所述物品特征向量存储模块中提取彳目息。 [0027] In any of the above preferred embodiment, it said article characteristic value calculation module extracts information from the article left foot mesh feature vector storage module.

[0028] 在上述任一方案中优选的是,所述物品特征值计算模块,用于对特征出现频率进行统计和计算之后得到各物品的各特征的得分,并将这些隐式反馈生成的特征存储成物品的特征向量。 [0028] In any of the above preferred embodiment, said article characteristic value calculation module, wherein each of each of the articles obtained after the frequency of occurrence statistics and calculating a score for the feature, and the implicit feedback generated feature feature vector storage into the article.

[0029] 在上述任一方案中优选的是,所述物品的特征向量存储于所述物品特征数据存储丰旲块。 [0029] In any of the above preferred embodiment, the article of the feature vectors stored in the feature data items stored abundance Dae block.

[0030] 在上述任一方案中优选的是,所述计算服务器具有特征与物品的相关度计算模块、物品与物品之间相关度计算模块。 [0030] In any of the above preferred embodiment, the server has calculated the correlation characteristics calculated with the item of correlation between modules, the computing module with the article goods.

[0031] 在上述任一方案中优选的是,所述计算服务器接收所述物品特征数据存储模块里的信息按情况调用不同的计算模块进行计算。 [0031] In any of the above preferred embodiment in that the computing server receives the article characteristic data storage module according to the information in the case of call different calculation modules calculate.

[0032] 在上述任一方案中优选的是,所述特征与物品的相关度计算模块,采用贝叶斯的公式计算特征对应于物品的条件概率,根据此特征找其对应的物品,按概率大小值进行排序。 [0032] In any of the above preferred embodiment, the correlation degree calculating the feature module and the article, the Bayesian formula wherein conditional probabilities corresponding to the article, to find the corresponding article based on this feature, according to the probability size values ​​are sorted.

[0033] 在上述任一方案中优选的是,所述物品与物品之间相关度计算模块,采用修正余弦相关度公式,在用户的隐式特征维度上计算物品之间的相关度。 [0033] In any of the above preferred embodiment, the degree of correlation between the article and the article calculation module, using cosine correlation correcting formula, calculating the correlation between the items in the user's implicit feature dimensions.

[0034] 在上述任一方案中优选的是,所述特征与物品的相关度计算模块得到的序列存储到所述结果存储模块,具体的存储在该模块下的特征与物品相关度存储模块。 [0034] In any of the above preferred embodiment, the article characteristics and the correlation calculation sequence storage module to the obtained result storage module, characterized in that the module and the storage module specific affinity article storage.

[0035] 在上述任一方案中优选的是,所述结果存储模块还具有物品与物品之间相关度存储模块,用来存储来自于所述物品与物品之间相关度计算模块得到的物品相关度结果。 [0035] In any of the above preferred embodiment, the memory module further having a correlation result of the storage module between the article and the article for storing an article from the correlation between the article and the article resulting correlation calculation module of the results.

[0036] 在上述任一方案中优选的是,所述推荐服务器用于接收用户的请求,并对请求进行解析之后返回推荐给用户的产品。 [0036] In any of the above preferred embodiment, the recommendation server for requesting the receiving user returns after products recommended to the user and to parse request.

[0037] 在上述任一方案中优选的是,所述推荐服务器具有处理两种请求方式的推荐,搜索推荐模块与相关物品推荐模块。 [0037] In any of the above preferred embodiment, the recommendation server having the two kinds of recommendation process the requested method, and searches the recommendation module related items recommending module.

[0038] 在上述任一方案中优选的是,所述搜索推荐模块是用户输入搜索条件时,为其推荐与搜索条件相关的产品。 [0038] In any of the above preferred embodiment, said search module is recommended when a user inputs a search condition, the search condition for the recommended related products.

[0039] 在上述任一方案中优选的是,所述相关物品推荐模块是当用户操作某一物品时, 为其推荐与此物品相关的物品。 [0039] In any of the above preferred embodiment in that the recommendation module related items is an item when the user operates, for the recommended item associated with this item.

附图说明 BRIEF DESCRIPTION

[0040] 图1是按照本发明的基于用户行为进行物品特征标注的推荐系统的一优选实施例的结构示意图。 [0040] FIG. 1 is a block diagram of an embodiment of a recommendation system according to a preferred feature of the article is denoted by the invention based on user behavior.

具体实施方式 Detailed ways

[0041] 下面结合附图和具体实施例对本发明的技术方案进行详细描述。 Drawings and specific examples of the technical solutions of the present invention will be described in detail [0041] below in conjunction.

[0042]本发明的目的在于提供一种处理推荐的基础数据的方法,以及建立在此方法上的系统。 [0042] The object of the present invention to provide a method of treating a recommended basic data, and the establishment of the system in this method. 通过使用用户搜索行为以及搜索行为之后加于物品之上的行为来构建隐式数据序列,并使用搜索条件对物品的特征进行重定义,来对特征与物品之间的相关度以及物品与物品之间的相关度进行计算,并将计算的两种结果分别存储作为推荐时使用的基础数据。 Applied to the user by using the search behavior and the behavior of the search after the act on articles constructed of implicit data sequence, and using the search criteria of the article characteristics redefine, and to the degree of correlation between the article and the article on the article characteristics the correlation between the calculated and the calculated two results are stored as basic data used for recommendation. 在用户进行搜索时,对用户的搜索条件进行分解,将分解得到的单词与计算得到的特征与物品相关度的数据进行匹配,得到相关物品的列表,根据相关度排序后推荐给用户;在用户操作某一物品时,根据当前操作的物品取计算得到的物品之间的相关度数据中查询,得到相关物品的列表,排序后推荐给用户。 When a user searches for the search condition the user is decomposed, words and calculating the decomposition of the obtained characteristics match the data items of correlation, to obtain a list of related items, recommended to the user according to the relevancy ranking; user in operation an item, according to the degree of correlation between the data items is calculated taking the current operation of the article obtained query to obtain a list of related items, sorted recommended to the user.

[0043]本发明提供的数据处理方法如下: [0043] The data processing method according to the present invention are as follows:

[0044] 1、通过网络从用户客户端收集用户行为数据并传送到终端服务器; [0044] 1, user behavior data is collected by the network from the user client and the server to the terminal;

[0045] 2、在处理服务器上通过程序对所述用户行为数据进行排序,并统计时间间隔,该时间间隔用于用户行为序列的切分; [0045] 2, the processing on the server by a program for sorting the user behavior data, and counts the time interval, the time interval for slicing user action sequences;

[0046] 3、将位于同一序列中的物品与用户行为进行关联; [0046] 3, the article will be located in user behavior is associated with the same sequence;

[0047] 4、对所述用户行为进行特征提取。 [0047] 4, the user behavior feature extraction.

[0048] 5、对所述物品与所述用户行为生成其特征向量空间,并计算各特征的得分; [0048] 5, characterized in generating a vector space with the article of the user behavior, and calculates the score of each characteristic;

[0049] 6、对物品的特征向量空间转置,得到特征的物品向量空间; [0049] 6, the feature vector space transpose article, an article obtained feature vector space;

[0050] 7、根据特征的物品向量空间,计算得到特征与物品的相关度; [0050] 7. The article feature vector space, the calculated degree of correlation with the article characteristics;

[0051] 8、根据物品的特征向量空间计算得到物品之间的相关度; [0051] 8. The article feature vector space is calculated degree of correlation between the article;

[0052] 9、根据相关度大小推荐用户感兴趣的物品。 [0052] 9, according to the size of the correlation interested users recommended items.

[0053]根据上述方法所示,该发明的一种具体实施例如下: [0053] According to the method shown, one specific embodiment of the invention is as follows:

[0054] -种基于用户行为进行物品特征标注的推荐系统,该系统包括3大数据处理服务器和3大存储模块。 [0054] - wherein the article kinds thereof recommendation system based on user behavior marked, the system includes a large data processing server 3 and the storage module 3 large. 数据处理服务器包括预处理服务器,计算服务器,推荐服务器。 Data server includes a pre-processing servers, compute servers, recommendation server. 存储模块包括日志存储模块、中间计算存储模块、结果存储模块。 A log storage module comprises a storage module, the storage module calculates the intermediate result storage module. 所述预处理服务器包括4个主要模块:用户行为数据收集模块,行为数据组织模块,物品关联行为提取模块,物品特征值计算模块。 The pretreatment server comprises four main modules: the user behavior data collection module, a data organization module behavior, conduct extraction module associated with the article, the article characteristic value calculation module.

[0055] 如图1所示,所述用户行为数据收集模块,用于从所述的业务日志存储模块中提取出用户的行为数据,并按行为的产生时间顺序生成行为序列,每条行为记录需要标示其产生者。 [0055] 1, the user behavior data collection module, for extracting, from the service log memory module the behavior data of the user, generate a time sequence of sequence generation behavior press behavior, each behavior recording producers need to mark it. 收集的行为数据包括:用户搜索条件以及用户操作的物品,比如用户浏览的商品或者用户购买的商品。 Behavioral data collected includes: a user search terms and user operating items, such as a user or users browse the merchandise purchased.

[0056] 所述用户行为数据收集模块将得到的用户行为数据存储在所述用户行为存储模块中,然后所述用户行为存储模块将信息输送到行为数据组织模块,所述行为数据组织模块具有如下功能:统计行为切分时间间隔,根据统计得到的时间间隔将用户行为序列进行切分成行为片段,所述基于用户行为进行物品特征标注的推荐系统及方法设定行为片段中的记录是有一定的因果联系。 [0056] the user behavior data collection module obtained user behavior data is stored in the user behavior memory module, memory module and the user behavior information is delivered to the tissue behavior data module, the module has the following behavior data tissue function: the statistical behavior segmentation interval for intervals cut into fragments behavioral time statistics on user behavior obtained according to the sequence, based on the user behavior recommendation system and method for recording items set behavior characteristics marked segment there is a certain causal link.

[0057] 所述行为数据组织模块在对用户行为序列进行切分后,将切分后的片段进行关联部分提取,该工作在所述物品关联行为提取模块中进行,所述物品关联行为提取模块,用于将物品与用户行为数据进行关联,并将用户行为数据按规则切分之后作为物品的特征。 [0057] The behavior data organization module in the sequence of user behavior after segmentation, the segmentation fragments associate partially extracted, the extraction module in the work item associated with the behavior, the behavior associated with the article extraction module for articles associated with the user behavior data, and then the user behavior data according to the rules as a feature article segmentation. 比如用户搜索了关键词"mp3 bag〃,然后在同行为片段中又浏览了iteml,那么将iteml与搜索关键词mp3 bag进行关联,mp3和bag作为用户隐式对物品iteml赋予的特征,而不论物品提供者对iteml的描述内容中是否包含这些关键词。 For example, a user searches for the keyword "mp3 bag〃, then the same behavior fragment glanced over iteml, it will iteml and associate search keywords mp3 bag, mp3 and bag as implicit user features for items iteml given, regardless of items provider contains a description of these keywords in the content iteml.

[0058] 所述物品关联行为提取模块将关键词与物品进行关联后,存储在所述物品特征向量存储模块,然后所述物品特征值计算模块调用所述物品特征向量存储模块存储的信息用于对特征出现频率进行统计和计算之后得到各物品的各特征的得分,并将这些隐式反馈生成的特征存储成物品的特征向量,I <F1:S1,F2:S2 ...>,其中I为物品,Fl为I的一个特征,Sl为Fl的得分。 [0058] The behavior of the article associated with the keyword extraction module associates the article, the article stored in the feature vector storage module, and the item characteristic value calculating module to invoke the article feature vector storage module for storing information to give the score of each characteristic of each of the articles after the occurrence frequency calculation and statistical feature, and stores these implicit feedback generated feature vector to the feature article, I <F1: S1, F2: S2 ...>, where I for the item, Fl is a feature of I, Sl Fl of the score. 例如iteml的两个关键特征mp3与bag,其向量得分描述如下: E.g. mp3 two key features of the bag iteml, the score vector which is described as follows:

[0059] iteml <mp3:0.3, bag:0.6>,iteml有两个特征,其中特征〃mp3〃的得分为0.3, 其中特征"bag"的得分为0.6。 [0059] iteml <mp3: 0.3, bag: 0.6>, iteml two characteristics, which characteristics 〃mp3〃 score of 0.3, wherein the feature "bag" score of 0.6.

[0060] 所述物品特征值计算模块将计算得到的向量存储在所述物品特征数据存储模块, 然后所述计算服务器从所述物品特征数据存储模块中提取数据,该计算服务器具有特征与物品的相关度计算模块以及物品与物品之间相关度计算模块。 [0060] The article characteristic value calculating module calculates the vector obtained feature data item stored in the storage module and the computing server extracts feature data item from said data storage module, the calculation server having the features of the article correlation calculating module, and the correlation between the article and the article calculation module.

[0061] 所述特征与物品的相关度计算模块用来计算特这与物品的相关度,采用贝叶斯的公式计算特征对应于物品的条件概率: [0061] The correlation degree calculating module for calculating characteristics of the article which correlation Laid article, using Bayesian conditional probability corresponding to the formula wherein the article:

[0062] P(Item|F)=P(Item)P(F|Item)/P(F) [0062] P (Item | F) = P (Item) P (F | Item) / P (F)

[0063] 在用来做推荐物品时,如果出现此特征,那么则根据此特征找其对应的物品,按概率大小值进行排序。 [0063] When used for the recommended items, if this feature is present, then find the corresponding article According to this feature, the probability sorted by size value. 比如当用户输入关键词mp3时,所述特征与物品的相关度计算模块将对物品i teml、i tem2…进行概率计算,得出P(iteml)与P( it em2)…然后比较特征对应于物品的条件概率大小以此来判断物品是否出现在推荐列表中。 For example, when the user inputs a keyword mp3, wherein said correlation computing module with the item of goods will i teml, i tem2 ... probability calculated the P (iteml) and P (it em2) ... then compare corresponding feature condition items in order to determine the probability of the size of the item appears in the recommended list.

[0064] 所述物品与物品之间相关度计算模块,采用修正余弦相关度公式,在用户的隐式特征维度上计算物品之间的相关度。 [0064] The correlation between the article and the article calculating module, using cosine correlation correcting formula, calculating the correlation between the items in the user's implicit feature dimensions. 修正余弦相关度公式采用: Cosine corrected using the correlation formula:

[0065] [0065]

Figure CN103400286BD00081

此公式用于计算第i个物品与第j个物品之间的相关度,记为sim(i,j)其中代表第i个物品的特征f的得分值,觀为特征f的得分平均值。 This formula is used to calculate the degree of correlation between the i-th and j-th article items, the score value is referred SIM (i, j) which represents the i-th feature f article, characterized View average score f . F为所有特征的集合 F is a set of all the features

[0066] 所述计算服务器计算的两种结果将被分别保存在所述结果存储模块中,该结果存储模块包括两个存储模块以存储不同类型的结果数据:特征与物品相关度存储模块、物品与物品之间相关度存储模块。 [0066] The two calculated result of the calculation will be the server are stored in the result storage module, the storage module comprises a result storage module to store two different types of result data: characterized in correlation with the item storage module article degree of correlation between the article and the storage module. 其中,特征与物品相关度存储模块用于存储特征与物品相关度计算模块中得到的序列,其存储格式为:Feature <Iteml:scorel, Item2:score2...>; 其中,Feature为特征,该特征关联的物品包括Iteml、Item2…并且每一个关联的物品都有相应的关联度score,也就是相似度。 Wherein, correlation characteristic with the article storage means for storing a sequence wherein the article obtained correlation calculating module, which is stored in the format: Feature <Iteml: scorel, Item2: score2 ...>; wherein the Feature is characterized by the associated article identity comprises Iteml, Item2 ... and each article has an associated respective relevance score, i.e. the degree of similarity. 物品与物品之间相关度存储_吴块用于存储物品与物品之间相关度计算模块中得到的结果序列,其存储格式为:Item〈Iteml :scorel,Item2: score2... >,其中11em为请求的物品,11emI、11em2…为与物品11em相关联的物品,并且给出得分score。 Correlation between the article and the article storage _ Wu block for storing the results of sequence between the article and the article obtained correlation calculating module, which is stored in the format: Item <Iteml: scorel, Item2: score2 ...>, wherein 11em request for the item, 11emI, 11em2 ... for the item associated with 11em articles, and gives scores score.

[0067] 所述推荐服务器是用来向用户进行产品推荐的,该服务器首先接收用户的请求, 然后将用户输入的关键词或物品进行解析,即用户请求解析,最后返回推荐给用户的产品。 The [0067] product recommendation server is used to recommend to the user, the server first receives the user's request, the user inputs a keyword and then parses or objects, i.e., the user request to resolve, and finally back to the product recommended to the user. 所述用户请求解释的具体步骤为对用户请求时输入的关键词进行切分,切分规则与物品关联行为提取模块中的特征切分方式一致:将该关键词或者语句切分为词或词组。 The specific steps will be explained requesting user segmentation as a key input when the user request, the feature module segmentation rules associated with the item extracting segmented patterns consistent behavior: the keyword or sentence segmented into words or phrases .

[0068] 所述推荐服务器包括处理两种请求方式,一种是用户输入搜索条件时比如关键词或者一段描述产品的语句,为其推荐与搜索条件相关的产品;另外一种是当用户操作某一物品时,为其推荐与此物品相关的物品。 [0068] The recommendation server includes a processing of both the requests, one is the user inputs a search condition such as a description or keyword phrase products, recommended for products with a search condition; the other is when the user operates a when an article for its recommendation item related to this article.

[0069]需要说明的是,按照本发明的基于用户行为进行物品特征标注的推荐系统及方法包括上述实施例中的任何一项及其任意组合,但上面所述的实施例仅仅是对本发明的优选实施方式进行描述,并非对本发明范围进行限定,在不脱离本发明设计精神前提下,本领域普通工程技术人员对本发明的技术方案作出的各种变形和改进,均应落入本发明的权利要求书确定的保护范围内。 [0069] Incidentally, according to the present invention based on user behavior recommendation system and method for labeling an article comprising any combination of the features of the embodiments described above and any one, but the above described embodiments of the present invention are merely described preferred embodiments are not to limit the scope of the present invention, and various modifications in design without departing from the spirit of the present invention, the ordinary engineering skill to make the technical solutions of the present invention and modifications shall fall within the claims of the present invention determined within the scope of the claims.

Claims (24)

1. 一种网上推荐相似商品的方法,该方法通过网络将用户行为传送到终端服务器上, 所述终端服务器处理该用户行为得到数据单元,然后从数据库中依据数据单元检索相似物品再返回客户端,其特征在于:所述处理用户行为包括终端服务器对用户行为进行特征提取与关联,并生成特征向量空间,所述检索相似物品包括从数据库中取出相似物品作为推荐物品并对该推荐物品进行排序;所述对用户行为进行特征提取与关联并生成特征向量空间,包括计算物品与物品的相关度;所述计算物品与物品之间的相关度的具体步骤为: A、 对搜索条件按照规则切分之后作为物品的特征向量空间,并计算各特征的得分; B、 根据物品的特征向量空间计算得到物品与物品之间的相关度。 A recommendation method similar items online, the method transmits user behavior through the network to the terminal server, the terminal server processes the data to give the user behavior unit, and similar articles according to the data from the database and then returned to the client unit retrieves characterized in that: said process comprising the terminal server user behavior user behavior associated with the feature extraction, and generates a feature vector space, the retrieval of similar articles, including articles removed from the database as similar to the recommended items and sorts the items recommended ; the user behavior associated with the feature extraction and generates a feature vector space, comprising a correlation calculating goods and goods; in particular the step of calculating the degree of correlation between the article and the article is: a, cut according to the rules of the search condition after the division of the article as a feature vector space, and calculates the score of each characteristic; B, according to the feature vector space is calculated to give the article the degree of correlation between the article and the article.
2. 根据权利要求1所述的网上推荐相似商品的方法,其特征在于:所述对用户行为进行特征提取与关联的实现方式包括如下步骤: A、 收集用户行为数据; B、 对用户行为进行排序,并统计时间间隔,该时间间隔用于用户行为序列的切分; C、 将所述物品与所述用户行为关联; D、 对所述用户行为进行所述特征提取。 The recommended method similar product line according to claim 1, wherein: said feature extraction is associated with the implementation of user behavior comprises the steps of: A, collecting user behavior data; B, user behavior sorting, and counting the time interval, the time interval for slicing user behavior sequence; C, associating the article with the user behavior; D, the behavior of the user characteristic extraction.
3. 根据权利要求1所述的网上推荐相似商品的方法,其特征在于:所述对用户行为进行所述特征提取还包括计算特征与物品的相关度。 The web according to claim 1 recommendation method similar items, characterized in that: said further comprising performing the feature extraction and correlation degree calculating characteristics of the article of user behavior.
4. 根据权利要求3所述的网上推荐相似商品的方法,其特征在于:所述计算特征与物品的相关度的具体步骤为: A、 对所述物品的特征向量进行空间转置,得到特征的物品向量空间; B、 根据所述特征的物品向量空间进行计算,得到特征与物品的相关度。 The web according to claim 3 recommendation method similar items, characterized in that: said step of calculating correlation with a particular feature of the article is: A, of the feature vector space of the article will be transposed, characterized give articles vector space; B, is calculated based on the vector space of the article characteristics, correlation characteristics obtained with the article.
5. 根据权利要求4所述的网上推荐相似商品的方法,其特征在于:所述特征与物品的相关度计算方法采用贝叶斯的公式来计算。 The web of claim 4, wherein the recommended method similar items, characterized in that: the method of calculating features related to the article is calculated using Bayes' formula.
6. 根据权利要求1所述的网上推荐相似商品的方法,其特征在于:所述物品与物品之间相关度计算方法采用修正余弦相关度公式来计算。 The web according to claim 1 recommendation method similar items, characterized in that: the correlation between the article and the article using the method for calculating the correction formulas to calculate the cosine correlation.
7. 根据权利要求4所述的网上推荐相似商品的方法,其特征在于:所述对推荐物品进行排序的方法包括:根据所述特征与物品的相关度以及物品与物品之间相关度,从数据库中取出相似商品,按照相关度大小进行排序,推荐给用户。 The recommended method of claim 4 online similar items claim, wherein: the sorting method of the recommended items comprising: the relevance of the characteristic and the degree of correlation between the article and the article article from remove similar items in the database, sorted according to the size of the correlation, it recommended to the user.
8. -种网上推荐相似商品的系统,通过对用户行为进行获取,然后在商品数据库中进行检索,得到相似产品,并将其推荐给用户,其特征在于:所述网上推荐相似商品的系统具有数据处理服务器与数据存储模块,并通过服务器对用户行为进行切分,并对从数据库取到的信息进行排序; 所述数据处理服务器具有预处理服务器、计算服务器、推荐服务器; 所述预处理服务器具有用户行为数据收集模块、行为数据组织模块、物品关联行为提取模块、物品特征值计算模块; 所述行为数据收集模块用于提取用户的行为数据,并按行为的产生时间顺序生成行为序列,每条行为记录标示其产生者; 所述行为数据组织模块将所述用户行为数据收集模块得到的行为序列切分成片段; 所述物品关联行为提取模块用于将物品与用户行为数据进行关联,并将用户行为数据按 8. - Species similar items recommended online system, by obtaining user behavior, and search on a database product, with similar products, and recommend it to the user, wherein: said line recommended systems having similar items data processing and data storage module server, and the server through the segmentation of user behavior, and to extract information from the database sorting; server having the data pre-processing servers, compute servers, recommendation server; the pretreatment server data collection module having a user behavior, the behavior data organization module, an extraction module behavior associated article, article characteristic value calculation module; said behavior data collection module configured to extract user behavior data, the behavior of the press generate time sequence generation behavior sequences, each Article behavior record mark which producer; the behavior data organization module of the user behavior data collection module behavior sequence obtained cut into segments; behavior of the article associated with the article extracting means for associating user behavior data, and user behavior data according to 切分规则切分之后作为物品的特征。 As a feature of the article after segmentation segmentation rules.
9. 根据权利要求8所述的网上推荐相似商品的系统,其特征在于:所述切分规则包括将用户搜索时输入的关键词粒度降低。 9. The web of claim 8 similar product recommendation system, wherein: said segmentation rules including the keyword input particle size reduction when a user searches.
10. 根据权利要求9所述的网上推荐相似商品的系统,其特征在于:所述将用户搜索时输入的关键词粒度降低到词或词组。 10. The web according to claim 9 similar product recommendation system, wherein: the input of the user to reduce the particle size to a keyword search word or phrase.
11. 根据权利要求8所述的网上推荐相似商品的系统,其特征在于:所述物品特征值计算模块用于对特征出现频率进行统计和计算,然后得到各物品的各特征的得分,并将这些隐式反馈生成的特征存储成物品的特征向量。 11. The web of claim 8 similar product recommendation system, wherein: said article characteristic value calculating means for the frequency of occurrence of the calculated statistics and characteristics, and obtained the score of each characteristic of each of the articles, and these implicit feedback generated feature vector is stored as a feature of the article.
12. 根据权利要求11所述的网上推荐相似商品的系统,其特征在于:所述计算服务器具有特征与物品的相关度计算模块、物品与物品之间相关度计算模块。 12. The web according to claim 11, wherein the recommendation system similar items, characterized in that: the correlation between the characteristics of the article associated with the server computing modules, the computing module with the article item calculation.
13. 根据权利要求12所述的网上推荐相似商品的系统,其特征在于:所述特征与物品的相关度计算模块,采用贝叶斯的公式计算特征对应于物品的条件概率。 13. The web according to claim 12, wherein the recommendation system similar items, characterized in that: said correlation characteristic computation module with the item, the Bayesian formula wherein conditional probabilities corresponding to the article.
14. 根据权利要求13所述的网上推荐相似商品的系统,其特征在于:所述物品与物品之间相关度计算模块采用修正余弦相关度公式来计算。 14. A web according to claim 13 similar item recommendation system, wherein: the correlation between the article and the article using the correction computation module to compute a cosine correlation formula.
15. 根据权利要求14所述的网上推荐相似商品的系统,其特征在于:所述推荐服务器具有处理两种请求方式的推荐:推荐与搜索条件相关的产品、推荐与某一物品相关的其它物品。 15. The web according to claim 14 similar product recommendation system, wherein: the two kinds of recommendation process recommendation server having the requested method: Recommended products related to the search criteria, to recommend other items associated with a particular article .
16. 根据权利要求15所述的网上推荐相似商品的系统,其特征在于:所述数据存储模块具有日志存储模块、中间计算存储模块、结果存储模块。 16. The web according to claim 15 similar item recommendation system, wherein: said data storage module having a log storage module, the storage module calculates the intermediate result storage module.
17. 根据权利要求16所述的网上推荐相似商品的系统,其特征在于:所述日志存储模块具有业务日志存储模块,用于存储各种日志记录信息的数据库。 17. The web according to claim 16 similar item recommendation system, characterized in that: said log storage module having a service log storage module, a database for storing a variety of logging information.
18. 根据权利要求17所述的网上推荐相似商品的系统,其特征在于:所述中间计算存储模块具有用户行为存储模块、物品向量存储模块、物品特征数据存储模块。 18. The web according to claim 17 similar item recommendation system, wherein: said intermediate storage module having a user behavior calculating module storing articles vector storage module, wherein the article data storage module.
19. 根据权利要求18所述的网上推荐相似商品的系统,其特征在于:所述用户行为存储模块用于接收从所述业务日志存储模块中提取的用户行为数据。 19. The web according to claim 18 similar item recommendation system, wherein: said storing means for receiving user behavior user behavior data extracted from the service log storage module.
20. 根据权利要求19所述的网上推荐相似商品的系统,其特征在于:所述物品向量存储模块用于存储经所述物品关联行为提取模块切分后的物品特征向量。 20. The web according to claim 19 similar item recommendation system, wherein: said vector storage means for storing the article through the article items extracted feature vector associated behavior after the segmentation module.
21. 根据权利要求20所述的网上推荐相似商品的系统,其特征在于:所述物品向量存储模块将信息经由所述物品特征值计算模块计算特征得分后传送到所述物品特征数据存储丰吴块。 21. The web according to claim 20 similar item recommendation system, wherein: said vector storage article information calculation module is transmitted to the article characterized wherein the score data storage via the Feng Wu article characteristic value calculation module Piece.
22. 根据权利要求21所述的网上推荐相似商品的系统,其特征在于:所述结果存储模块用于存储经过所述计算服务器进行计算之后的结果数据具有特征与物品相关度存储模块、 物品与物品之间相关度存储模块。 22. The web according to claim 21 similar item recommendation system, wherein: said result storing means for storing the result data via the server after calculation with an article having features correlation storage module and items correlation between the article storage module.
23. 根据权利要求22所述的网上推荐相似商品的系统,其特征在于:所述特征与物品相关度存储模块用于存储所述特征与物品的相关度计算模块计算得到的结果数据。 23. The web according to claim 22 similar item recommendation system, wherein: said article and wherein the correlation result data storing means for storing the correlation characteristic calculation module calculates the article obtained.
24. 根据权利要求23所述的网上推荐相似商品的系统,其特征在于:所述物品与物品之间相关度存储模块用于存储所述物品与物品之间相关度计算模块计算得到的结果数据。 24. The web according to claim 23 similar item recommendation system, wherein: the correlation between the article and the article storage means for storing the result data between the article and the article correlation calculation module calculates obtained .
CN201310333575.0A 2013-08-02 2013-08-02 One proposed system and method for tagging feature articles based on user behavior CN103400286B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310333575.0A CN103400286B (en) 2013-08-02 2013-08-02 One proposed system and method for tagging feature articles based on user behavior

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310333575.0A CN103400286B (en) 2013-08-02 2013-08-02 One proposed system and method for tagging feature articles based on user behavior

Publications (2)

Publication Number Publication Date
CN103400286A CN103400286A (en) 2013-11-20
CN103400286B true CN103400286B (en) 2016-12-28

Family

ID=49563901

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310333575.0A CN103400286B (en) 2013-08-02 2013-08-02 One proposed system and method for tagging feature articles based on user behavior

Country Status (1)

Country Link
CN (1) CN103400286B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104750760B (en) * 2013-12-31 2018-11-23 中国移动通信集团上海有限公司 A preferred implementation method and device application software
CN103700011B (en) * 2014-01-13 2016-11-23 重庆大学 One feature extraction method and apparatus
CN103995831B (en) * 2014-04-18 2017-04-12 新浪网技术(中国)有限公司 Item processing method based on similarity between items, systems and apparatus
CN103942712A (en) * 2014-05-09 2014-07-23 北京联时空网络通信设备有限公司 Product similarity based e-commerce recommendation system and method thereof
CN104063589B (en) * 2014-06-16 2018-01-16 百度移信网络技术(北京)有限公司 A preferred method and system
CN104156450B (en) * 2014-08-15 2017-11-07 同济大学 Item information recommendation method based on network user data
CN106708821A (en) * 2015-07-21 2017-05-24 广州市本真网络科技有限公司 User personalized shopping behavior-based commodity recommendation method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101206752A (en) * 2007-12-25 2008-06-25 北京科文书业信息技术有限公司 Electric commerce website related products recommendation system and method
US9916611B2 (en) * 2008-04-01 2018-03-13 Certona Corporation System and method for collecting and targeting visitor behavior
CN102509233A (en) * 2011-11-29 2012-06-20 广东领域集团有限公司 User online action information-based recommendation method

Also Published As

Publication number Publication date
CN103400286A (en) 2013-11-20

Similar Documents

Publication Publication Date Title
US8745067B2 (en) Presenting comments from various sources
US20080005105A1 (en) Visual and multi-dimensional search
JP5662961B2 (en) Review processing method and system
US8612435B2 (en) Activity based users&#39; interests modeling for determining content relevance
US20080183699A1 (en) Blending mobile search results
US7895235B2 (en) Extracting semantic relations from query logs
US8626768B2 (en) Automated discovery aggregation and organization of subject area discussions
US8589418B1 (en) System for facilitating discovery and management of feeds
US20120317088A1 (en) Associating Search Queries and Entities
JP5575902B2 (en) Information retrieval based on semantic pattern of the query
US20070244867A1 (en) Knowledge management tool
US20060004732A1 (en) Search engine methods and systems for generating relevant search results and advertisements
US20110196737A1 (en) Semantic advertising selection from lateral concepts and topics
US8346791B1 (en) Search augmentation
JP5778255B2 (en) The method of queries based on vertical search, system, and device
US20100274753A1 (en) Methods for filtering data and filling in missing data using nonlinear inference
JP5736469B2 (en) Recommendation of the search keyword based on the presence or absence of user intent
US20090030899A1 (en) Processing a content item with regard to an event and a location
US20090287676A1 (en) Search results with word or phrase index
CN101923545B (en) Method for recommending personalized information
US7283997B1 (en) System and method for ranking the relevance of documents retrieved by a query
US9171088B2 (en) Mining for product classification structures for internet-based product searching
JP5860456B2 (en) Determination of the search term weighting and use
US20110072001A1 (en) Systems and methods for providing advanced search result page content
US20110087647A1 (en) System and method for providing web search results to a particular computer user based on the popularity of the search results with other computer users

Legal Events

Date Code Title Description
C06 Publication
C10 Entry into substantive examination
C14 Grant of patent or utility model
TR01