CN101751437A - Web active retrieval system based on reinforcement learning - Google Patents

Web active retrieval system based on reinforcement learning Download PDF

Info

Publication number
CN101751437A
CN101751437A CN 200810240358 CN200810240358A CN101751437A CN 101751437 A CN101751437 A CN 101751437A CN 200810240358 CN200810240358 CN 200810240358 CN 200810240358 A CN200810240358 A CN 200810240358A CN 101751437 A CN101751437 A CN 101751437A
Authority
CN
China
Prior art keywords
web
user
learning
module
agent
Prior art date
Application number
CN 200810240358
Other languages
Chinese (zh)
Inventor
刘琰琼
张文生
李益群
杨彦武
梁玉旋
肖宪
Original Assignee
中国科学院自动化研究所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院自动化研究所 filed Critical 中国科学院自动化研究所
Priority to CN 200810240358 priority Critical patent/CN101751437A/en
Publication of CN101751437A publication Critical patent/CN101751437A/en

Links

Abstract

The invention discloses a Web active retrieval system based on reinforcement learning; the system comprises a Web search Agent module, a Web filter Agent module, a Web interface Agent module and a user information learning Agent module; wherein, the Web search Agent module is used for searching subjects based on user interests, analyzing Web content and realizing Web download function; the Web filter Agent module is used for finishing web content analysis, page filtering and classified index; the Web interface Agent module is used for recommending webs on behalf of user interests after learning, receiving the user feedbacks and recording user browsing behaviors and having statistical analysis function; and the user information learning Agent module is used for realizing the interest updates based on reinforced learning, updating the user information continuously and finishing the optimum model on behalf of user interest. The Web active retrieval system based on reinforcement learning has strong self-adaptability, high accuracy and convenient use.

Description

基于强化学习的网页页面主动式检索系统 Reinforcement Learning web page active retrieval system

技术领域 FIELD

[0001] 本发明涉及Web用户的主动式检索技术领域,尤其涉及一种基于强化学习的网页页面主动式检索系统,用于实现对Web用户进行最能体现用户兴趣模式的Web页面推荐。 [0001] The present invention relates to the field of active Web users retrieval technology, particularly to a web page based on reinforcement learning active retrieval system for implementing Web users can best embody the user interest model Web page recommendation.

背景技术 Background technique

[0002] 马尔科夫决策过程包含一个环境状态集S,方法行为集合A,奖赏函数R和状态转移函数P。 [0002] Markov Decision Process environment comprises a set of states S, methodological acts set A, the reward function R and a state transition function P. 奖赏函数R(s,a,s')是在状态s的情形下采用动作a,环境状态转移到s'获得的瞬时奖赏值;记P(s, a, s')在状态s的情形下采用动作a使环境状态转移到s'的概率。 Reward function R & lt (s, a, s') is the use of the action a in state s case, the environment state to s' instantaneous reward values ​​obtained; the note P (s, a, s') in a state s case probabilistic state transition operation to make the environment a s',. 马尔科夫决策过程的本质是:当前状态向下一状态转移的概率和奖赏值只和当前状态以及当前状态下选择的动作有关,而与以前的历史状态以及历史动作无关。 Nature Markov decision process are: the current state to the next state transition probabilities and reward value and only the currently selected state and the current state of motion, whereas nothing to do with the previous history and the history of state action. 因此在状态转移概率函数P和奖赏函数R都已经确定的环境模型的知识框架下,动态规划的技术可以用来求解最优策略。 Therefore, the state transition probabilities under the framework of environmental knowledge of the model function P and reward function R have been determined, dynamic programming techniques can be used to solve the optimal strategy. 然而在现实世界中的大部分情况下,状态转移概率函数P和奖赏函数R的环境模型却难以确定,强化学习主要是着重研究奖赏函数和状态转移函数未知的情况下,如何学习最优行为策略。 However, in the real world most cases, the state transition probability model function P environment and reward function R becomes problematic, mainly focuses on reinforcement learning and reward function under the state transition function is unknown, how learning optimal behavior strategy .

[0003] 强化学习(reinforcement learning,又称再励学习,评价学习)是机器学习方法的一个重要的分支,在智能控制机器人及分析预测等领域有许多应用。 [0003] reinforcement learning (reinforcement learning, also known as reinforcement learning, evaluation study) is a machine learning methods important branch of intelligent control in robotics and other fields have a lot of analysis and forecasting applications. 强化学习是对智能系统中从环境到行为映射的学习,以使累积的奖赏(强化信号)函数值最大,强化学习不同于传统机器学习中的监督学习主要表现在教师信号上,强化学习中由环境提供的强化信号是对做出的动作作出一种评价作为奖赏值,而不是告诉直接强化学习系统(reinforcementlearning system)如何去产生正确的动作。 Reinforcement learning is learning intelligent system mapping from the environment to conduct, so that the accumulated reward (enhanced signal) function value the most, reinforcement learning is different from supervised learning traditional machine learning mainly in the teacher signal, by reinforcement learning strengthen the signal environment is provided to make an evaluation of the action as a reward value, rather than telling a direct reinforcement learning system (reinforcementlearning system) how to generate the correct action. 由于外部环境提供的信息较少,强化学习系统必须靠自己获得的经历进行学习。 Since less information provided by the external environment, learning reinforcement learning system must rely on the experience they have gained. 通过这种方式,强化学习系统在行动-评价的环境中获得计算基础,提出改进的行动方案以适应环境。 In this way, reinforcement learning system in action - get calculated on the basis of the environmental assessment, the proposed improvement action plan to adapt to the environment. 目前的强化学习的学习技术大致可分成两类:一是搜索智能系统的行为空间,从而发现可以做出的最优的行为。 The current reinforcement learning learning technologies can be broadly divided into two categories: First, the search space intelligence system behavior, to discover the optimal behavior can be made. 典型的技术如遗传算法等搜索技术;二是采用基于统计的技术和动态规划的思想来估计和预测在某一确定环境状态下的价值函数值,从而通过获取的价值函数来确定最优行为。 Typical techniques such as genetic algorithms and other search technology; the second is based on the idea of ​​using dynamic programming techniques and statistical estimates and projections to determine the value of the function value under certain environmental conditions in order to determine the optimal behavior by the value function acquired. [0004] 在强化学习需要解决的问题中,由于环境是不确定的,策略指导下的每一次学习所得到的Rt有可能是不相同的。 [0004] In the reinforcement learning problems to be solved, because the environment is uncertain, each time learning under the policy guidance of the resulting Rt there may not be the same. 因此在s状态下的值函数要考虑在不同学习中所有可能的返回函数的数学期望值。 Therefore, the value of the function in the s state to consider all the possible mathematical expectation of the return of function in different studies. 实际中经常采用逼近方法进行值函数的估计,一种最主要的方法就是Monte Carlo采样方法。 In practice frequently used approximation method to estimate the value of the function A is the main method of Monte Carlo sampling methods. 将Monte Carlo采样方法和动态规划技术结合起来,通过多次试验,用实际获得的奖惩返回值去逼近真实的状态值函数,Monte Carlo采样方法通常是采用一次学习循环所获得的值函数去逼近实际的值函数,而强化学习方法使用下一状态的值函数(即Bootstrapping方法)和当前获得的瞬时奖赏来逼近当前状态值函数。 The Monte Carlo sampling methods and dynamic programming techniques combined through several tests, with the return value actually obtained incentive to approximate the true value of a state function, Monte Carlo sampling method is typically a function of cycle time learning value to approximate actual obtained value function, and methods of using reinforcement learning value of the next state function (i.e. Bootstrapping method) currently obtained and instantaneous reward to approximate a function of the current state value. 强化学习方法需要多次学习循环才能最终逼近实际的值函数。 Reinforcement learning method requires multiple cycles to learn the final approach the actual value of the function.

[0005] 信息检索(Information Retrieval),通常指的是基于文本的信息检索,包括信息的存储、组织、表现、查询、存取等各个方面,其核心为文本信息的索引和检索。 [0005] Information retrieval (Information Retrieval), generally refers to text-based information retrieval, including information storage, organization, the performance of the query, access and other aspects, the core of the indexing and retrieval of text information. 从历史发展进度来看,信息检索经历了人类手工检索,计算机自动化检索,网络智能化检索等多个发展阶段。 From the point of view of historical development progress, information retrieval experienced human hand searching, a number of development stages of computer-automated search, retrieval and other intelligent network. 目前,信息检索已经发展到网络化和智能化的阶段。 Currently, the network has grown to information retrieval and intelligent stage. 信息检索的对象也从封闭、比较稳定一致、由独立的数据库集中管理信息内容扩展到开放、动态、快速、分布广泛、管理松散复杂的Web页面内容;原来的使用信息检索的用户为情报专业人员,现在的信息检索包括商务人员、管理人员、教师、学生、各专业人士等在内的普通大众,他们对信息检索从结果到方式提出了更高、更多样化的要求。 Information retrieved from the object is also closed, relatively stable and consistent, extended by an independent database to manage information content to an open, dynamic, fast, widely distributed, loosely managing complex Web page content; use the original user information retrieval for Intelligence Professionals now information retrieval, including business people, administrators, teachers, students, professionals, etc., each of the general public, they put forward the results to the information retrieved from the way higher, more varied requirements. 适应网络化、智能化以及Web个性化的需要是目前信息检索技术发展的新趋势。 Meet the needs of networked, intelligent and personalized Web information retrieval is the development of new technology trends. 现实世界中,目前存在较多基于统计方法的对于网络个性化的方法。 The real world, current methods for network-based personalization of many statistical methods exist. 但是这种方法自适应能力较差,并且不具备学习能力。 However, this method is poor adaptability, and is incapable of learning. 然而强化学习的特点可以改进目前的这种基于统计方法的网络个性化分析方法。 However, learning can be enhanced features that improve the current network personality analysis method based on statistical methods.

发明内容 SUMMARY

[oooe]( — )要解决的技术问题 [Oooe] (-) technical problem to be solved

[0007] 有鉴于此,本发明的主要目的在于提供一种基于强化学习的Web主动式检索系统,以协助用户更方便的浏览Web,以及更准确的找到用户所需要的目标页面。 [0007] In view of this, the main object of the invention is to provide an enhanced Web-based retrieval system active learning to help users more easily browse the Web, as well as the target page to find a more accurate user needs. [0008] ( 二)技术方案 [0008] (ii) Technical Solution

[0009] 为达到上述目的,本发明提供了一种基于强化学习的Web主动式检索系统,该系统包括: [0009] To achieve the above object, the present invention provides a Web-based strengthening active learning retrieval system, the system comprising:

[0010] l)Web搜索Agent模块,由信息搜索、Web页面分析以及Web页面下载这几个功能块组成,主要完成利用与用户兴趣主题相关的搜索、网页内容分析和页面的下载功能。 [0010] l) Web search Agent module, the information search, analysis of Web pages and Web page to download these function blocks, mainly to complete utilization and user interest topics related searches, web pages and download content analysis function. 首先,由用户先输入原始的请求,根据搜索引擎进行初始页面的获取,并且将页面上的相关链接提取并存放到一个缓冲区,页面下载模块将根据链接(URL地址)访问相应的网页,同时按照主题关键词分类保存。 First, the user first enter the original request, obtain the initial page of the search engines, and related links on the page and place the extraction of a buffer, page download module according to the corresponding webpage link (URL address) access, while Category saved by topic keywords.

[0011] 2) Web过滤Agent模块,由页面分析及页面过滤两个功能块组成,主要完成对信息搜索Agent搜索获取的页面进行内容分析,利用强化学习中的Q学习系统对各个Web页面进行Q学习中值函数计算,此时,Web页面的对于关键词的TFIDF值作为Q学习中的立即奖赏值,可以计算出各个Web页面所对应的Q值,对Web页面的Q值排序和过滤,取出Q值排名较靠前的Web页面;Web过滤Agent模块的作用主要包括Web页面分析、Web页面过滤和Web页面分类索引,并将结果提供Web接口Agent模块。 [0011] 2) Web Agent filtration module, filtered by two pages and page analysis functional blocks, mainly to complete the search for the page information acquired by the search Agent analyze the content of Web pages each Q Q learning system using reinforcement learning is learning function value calculated at this time, the Web page for keywords TFIDF value as an immediate value Q reward learning, the Q value can be calculated corresponding to each of the Web page, the Web page of the Q-value sorting and filtering, extraction Q value than the front rank of the Web page; Web filtering role of Agent modules including Web page analysis, filtering Web pages and Web page classification index, and the results provide a Web interface to Agent module.

[0012] 3)Web接口Agent模块,由页面推荐模块、页面显示模块和浏览行为统计模块组成,完成推荐与学习模块完成的用户模型最相关的网页予以浏览、并且接收反馈值,对浏览行为的分析和统计等功能。 [0012] 3) Web Interface Agent module, the page recommendation module, the page display module and browsing behavior statistics module, complete the most relevant pages user model recommended by learning module completed to be browsing, and receive feedback value, browsing behavior analysis and statistics.

[0013] 4)用户信息学习Agent模块,由初始化、兴趣度修正计算和兴趣度更新等主要功能块组成,首先需要用户的注册,根据知识库的信息自动生成该用户的初始兴趣文件(Profiles),在搜索浏览网页过程中,利用强化学习中的TD学习算法可以对用户的兴趣模型进行更新和改进,根据Web接口Agent模块所完成的对用户浏览页面行为的记录而生成获取到的用户的反馈值,由兴趣度计算和更新模块不断对用户的兴趣进行更新,利用TD学习算法对用户的兴趣模式进行计算,最终达到用户信息模型的最佳权重分布。 [0013] 4) User Agent information learning module, by the initialization, the interest degree correction calculation and updating, etc. Main interestingness functional blocks, the user first need to register automatically generating an initial interest in the user file (Profiles The information in the knowledge base) in search of browsing the web, using the enhanced TD learning algorithm learning may interest model for the user to be updated and improved, and generate user gets to the history page of the behavior of users of the Web Interface Agent module completed based on feedback value, calculated by the degree of interest and update modules continue to update the user's interests, the interest of the user computing model, and ultimately achieve optimal weight redistribution user information model using the TD learning algorithm. [0014](三)有益效果 [0014] (c) beneficial effect

[0015] 从上述技术方案可以看出,本发明具有以下有益效果: [0015] As can be seen from the above technical solutions, the present invention has the following advantages:

[0016] 1、本发明提供的这种Web主动式检索系统,具有准确度高,使用方便的特点,能够 [0016] 1, which Web active search system provided by the present invention, with high accuracy, and convenience of use, can be

4实现对Web用户进行个性化Web页面推荐。 4 realization of Web users personalized Web page recommendation.

[0017] 2、本发明提供的这种Web主动式检索系统,利用强化学习的Q学习的系统,对由Web搜索Agent的页面进行过滤,可以更好的考虑到不止是当前推荐的页面,而且考虑到Web搜索Agent推荐页面的超链接的网页信息,充分利用了Web页面结构来进行过滤系统的优化。 [0017] 2, such an active Web search system provided by the present invention, the use of Q-learning reinforcement learning systems, to be considered by the filtered search Agent Web page may be better than currently recommended to pages, and considering the hyperlink web information search Agent Web page recommendation, the full use of the Web page structure to optimize filter system.

[0018] 3、本发明在进行用户信息学习中,利用了强化学习中的TD学习多样化模型进行学习和更新,使得用户模型可以越来越接近用户的最佳代表模型,将模型应用到Web过滤Agent就可以提取出更能符合用户兴趣模型的Web页面予以推荐。 [0018] 3, the present invention is performed in the learning user information using the diversity TD learning reinforcement learning model of learning and updated, so that the user model may be closer to the user's best representative model, the model is applied to the Web Agent can be filtered to extract better meet the user interest model of Web pages to be recommended.

附图说明 BRIEF DESCRIPTION

[0019] 图1是本发明提供的Web主动式检索系统的整体逻辑框图; [0019] FIG. 1 is an overall block diagram of the logic of the Web active search system provided by the present invention;

[0020] 图2是本发明提供的Web主动式检索系统中对Web页面过滤的结构示意图; [0020] FIG. 2 is a schematic structural diagram of an active Web retrieval system of the present invention provides the Web pages of the filter;

[0021] 图3是本发明提供的Web主动式检索系统的中用户信息学习和更新的结构示意图。 [0021] FIG. 3 is a Learning Web active user information retrieval system and the present invention provides a schematic structural diagram of the update.

具体实施方式 Detailed ways

[0022] 为使本发明的目的、技术方案和优点更加清楚明白,以下结合具体实施例,并参照附图,对本发明进一步详细说明。 [0022] To make the objectives, technical solutions, and advantages of the present invention will become more apparent hereinafter in conjunction with specific embodiments, and with reference to the accompanying drawings, the present invention is described in further detail.

[0023] 如图1所示,图1是本发明提供的Web主动式检索系统的整体逻辑框图,包括一个原始输入端和一个页面推荐输出端。 [0023] As shown in FIG. 1, FIG. 1 is an overall block diagram of the logic of the Web active search system provided by the present invention, includes an input terminal and an original page recommendation output terminal. 其中,原始输入端用于向网页页面(Web)搜索Agent 模块发送原始的用户请求,Web搜索Agent模块接收用户请求后利用搜索引擎进行搜索, 搜索完毕后,Web过滤Agent模块对其中的页面进行学习和排序,提取出页面通过Web接口Agent模块推荐给用户;用户在浏览之后,利用Web接口Agent模块进行记录,获取用户的浏览信息,进行分析和统计,将反馈值提交给用户信息学习Agent模块,用户信息学习Agent模块将对其进行更新和优化。 Wherein the input for the original (Web) Search Agent module sends user requests to the original web page, upon receiving a user request for a Web search using search module Agent search engine, the search is completed, filtering Agent Web page module which learns and sort, extract the page recommended to the user through the Web Interface Agent module; after the user to browse, use Web Interface Agent module for recording, obtain browsing information about users, analysis and statistics, the feedback value presented to the user information learning Agent module, user Agent information learning module will be updated and optimized.

[0024] 如图2所示,图2是本发明提供的Web主动式检索系统中的Web过滤Agent模块的结构示意图。 [0024] As shown in FIG 2, FIG. 2 is a schematic structural diagram of an active Web retrieval system of the present invention provides filtering in the Web Agent module. 该Web过滤Agent模块包括相关值估计模块、Q学习模块和Q值排序模块组成。 The Web Agent module comprises filtering correlation value estimating module, Q and Q values ​​learning module sorting module. 在Web搜索Agent模块搜索完毕之后,将存贮的页面取出,进行页面内容的分析,根据TFIDF技术,可以将Web页面用向量模式表示,利用余弦计算距离的方法计算用户与Web 页面的相似度,对于一个给定的i(第i个页面文件)我们需要对其中的每个关键词进行学习,对应的每个页面的TFIDF值就作为到达该页面时所获的立即奖赏值,当前页面的超链接所对应的下一层页面的奖赏之可作为下一个状态对应点的奖赏值,在求奖赏值总合时需要乘上一个折扣值,这里,我们采取了基于图的深度优先搜索算法实现计算最大的路径的奖赏值的折扣和,并将其赋值给Q[i], Q[i]就记录为当前第i个页面Q值,计算出所有的页面的Q值之后对所有的网页对应的Q值进行的。 After the Agent module searches a Web search is completed, the stored page removed page content analysis, according to techniques TFIDF, Web pages may be represented by a vector pattern, calculates the similarity of the user and the Web page using the method of calculating the cosine distance, for a given i (i-th page file) for each keyword we need them to learn, TFIDF corresponding values ​​for each page on the value obtained as the immediate reward of reaching the page, current page super Rewards of links corresponding to the next layer as the next page can be a state corresponding points reward value, the value of the total reward seeking timely need to take a discount on the value, here, we have taken to achieve the depth-first search algorithm graph-based computing Rewards discount value and the maximum path, and then assigned to Q [i], Q [i] is recorded for the current i-th page Q value, the Q value is calculated for all of the pages corresponding to all the pages the Q value. 将排名靠前的前K个页面进行推荐给用户。 The former top-ranking K pages were recommended to the user.

[0025] 如图3所示,图3是本发明提供的Web主动式检索系统中用户信息学习Agent模块的结构示意图。 [0025] As shown, FIG. 3 is a schematic structural diagram of the learning user information module Web Agent retrieves the active system of the present invention provides 3. 该用户信息学习Agent模块主要包括计算用户模型更新权重模块和TD学习更新模块,在Web接口Agent获取了用户的浏览Web信息之后,Web接口Agent会提供一 After learning of the user information Agent module includes a user computing model update weights and TD learning module update module, the Web Interface Agent to obtain the user's Web browsing information, we will provide a Web interface to Agent

5个用户的反馈值给用户信息学习Agent模块以供用户信息学习模块进行学习和更新,反馈值主要是由显式的和隐式的反馈值组成,其中显式的反馈主要是由用户的评分获取,而隐式的反馈值主要决定于四大要素:bookmarking(bm) , reading time(rt), scrolling(sc), following up the hyperlinks in the filtered documents (fl)。 Feedback value five users to the user information learning Agent module for subscriber information learning module to learn and update the feedback value mainly by explicit and implicit feedback value, where explicit feedback mainly by user rating acquisition, and implicit feedback value is mainly determined by four elements: bookmarking (bm), reading time (rt), scrolling (sc), following up the hyperlinks in the filtered documents (fl). 利用反馈值可以对用户的权重进行更新: [0026] Wp,k —Wp,k+|3ri,k User using the feedback value can be updated weights: [0026] Wp, k -Wp, k + | 3ri, k

[0027] |3为用户学习的速度。 [0027] | 3 users learning speed. Wp,k表示用户向量的第k维的值。 Wp, k denotes the k-th value of the user-dimensional vectors.

[0028] 利用向量的项权值(归一化处理之后)的变化来近似的衡量未来的奖赏值。 (After normalization) [0028] With the term weight vectors to approximate the change in the measured value of the future reward. 当用户模型权重变化小于经过预先测试得到的阈值时表示关键词的选择代表用户的兴趣已经趋于最优,当这个变化之为正时,说明当前用户模型向量将会获得更好的反馈值。 When the user model weight change is less than the pre-test threshold to obtain representation through the choice of words on behalf of the user's interest has become the best, when the change of positive, indicating that the current user model vector will get better feedback value. [0029] Wpk,t —Wpk,t—,[Rt+Y Avp,j [0029] Wpk, t -Wpk, t -, [Rt + Y Avp, j

<formula>formula see original document page 6</formula>[0031] 如果Wpk在推荐的页面中出现,则增加这些出现的关键词权重,如果变化值为负值,则这些关键词的权值的变化较小。 <Formula> formula see original document page 6 </ formula> [0031] If the recommended Wpk page appears, the weights of these keywords appear to increase the weight, change if the change is negative, then the weights of the keywords small. 处理完这K个页面的内容之后,Web主动式检索系统会进行到下一个检索点,学习将继续进行,一直到能够达到近似代表最优用户模型。 After processing the K content pages, Web active retrieval system will proceed to the next access point, learning will continue until able to achieve the best approximation on behalf of the user model. [0032] 本发明提供的Web主动式检索系统,具有学习能力强、使用方便、可靠性高的特点,能够方便的使用该系统协助用户进行主动式检索和推荐Web页面。 Active Web search system provided by the invention [0032] This has ability to learn, easy to use, highly reliable, and can easily use the system to assist the user to retrieve and active recommendation Web pages.

[0033] 以上所述的具体实施例,对本发明的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本发明的具体实施例而已,并不用于限制本发明,凡在本发明的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。 Specific Example [0033] above, the objectives, technical solutions, and beneficial effects of the present invention will be further described in detail, it should be understood that the above descriptions are merely embodiments of the present invention, but not intended to limit the present invention, within the spirit and principle of the present invention, any modifications, equivalent replacements, improvements, etc., should be included within the scope of the present invention.

Claims (8)

  1. 一种基于强化学习的网页页面主动式检索系统,其特征在于,该系统包括:Web搜索Agent模块,用于接收用户的初始请求,并且将用户请求进行分析,利用相关主题分析下载Web页面,将结果提交至Web过滤Agent模块;Web过滤Agent模块,用于实现对信息搜索Agent搜索获取的页面进行内容分析,利用强化学习中的Q学习系统对各个Web页面进行Q学习中值函数计算,并将结果提供Web接口Agent模块;Web接口Agent模块用于提供用户推荐Web页面,记录用户浏览行为,并将结果提交给用户信息学习Agent模块;用户信息学习Agent模块,用于利用强化学习中的TD学习算法对用户的兴趣模型进行更新和改进,根据Web接口Agent模块所完成的对用户浏览页面行为的记录,生成获取到的用户的反馈值,由兴趣度计算和更新模块不断对用户的兴趣进行更新,利用TD学习算法对用户的兴趣模式进 Strengthen the web page based on active learning retrieval system, characterized in that the system comprises: Agent Web search module, for receiving the initial request of the user, and the user request analysis, to download the Web page using the topic, the results are submitted to the Web Agent filtration module; Agent Web filtering module, for enabling the search page information acquired by the search Agent analyze the content of Web pages each learning value of the function Q is calculated using Q reinforcement learning in the learning system, and the results provide a Web interface to Agent module; Agent Web Interface module for providing a user recommendation Web pages, records user browsing behavior, and the results are presented to the user information learning Agent module; user Agent information learning module for TD learning in the use of reinforcement learning algorithm to the user's interest model is updated and improved, according to the record of user browsing a page action Web Interface Agent module completed, generating a feedback value obtained user to calculate the interest level and the update module constantly on the user's interests to be updated by TD learning algorithm for pattern into the user's interest 计算,最终达到用户信息模型的最佳权重分布。 Computing, and ultimately achieve optimal weight redistribution user information model.
  2. 2. 根据权利要求1所述的基于强化学习的网页页面主动式检索系统,其特征在于,该Web搜索Agent模块由信息搜索、Web页面分析以及Web页面下载这几个功能模块组成,用于实现与用户兴趣主题相关的搜索、网页内容分析和页面的下载功能;由用户先输入原始的请求,根据搜索引擎进行初始页面的获取,并且将页面上的相关链接提取并存放到一个缓冲区,页面下载模块将根据链接URL地址访问相应的网页,同时按照主题关键词分类保存。 According to claim reinforcing web page based on active learning retrieval system, characterized in that said 1, the Web search for searching information from the Agent module, Web pages and Web page download This analysis of several functional blocks for implementing download function associated with the user's topic of interest search, web content analysis and pages; user to enter the original request, obtain the initial page of the search engines, and related links on the page and place the extraction of a buffer page Download module will access the corresponding web page according to the link URL address, and save by topic keyword classification.
  3. 3. 根据权利要求1所述的基于强化学习的网页页面主动式检索系统,其特征在于,该Web过滤Agent模块接收到Web搜索Agent模块实施结果之后,利用Q学习算法结合用户特征模型对Web页面内容进行分析及排序,取出Q值排名靠前的Web页面进行过滤;Web过滤Agent模块的作用主要包括Web页面分析、Web页面过滤和Web页面分类索引,并将结果提供Web接口Agent模块。 3. Active web page based on reinforcement learning retrieval system, characterized in that said claim 1, the Web Agent filtering module receiving the results of a Web search Agent module Thereafter, wherein Q learning algorithm in conjunction with a user model Web pages sorting and analyzes the content, the Q value taken ranking Web pages filtering; Web Agent role filtering modules includes analyzing Web pages, Web pages and Web pages filtering classification index, the result provides a Web Interface Agent module.
  4. 4. 根据权利要求1所述的基于强化学习的网页页面主动式检索系统,其特征在于,该Web接口Agent模块用于实现推荐与学习模块完成的用户模型最相关的网页予以浏览、并且接收反馈值,对浏览行为的分析和统计。 4. Active web page based on reinforcement learning retrieval system, wherein according to claim 1, which Web Agent module for implementing the interface to be recommended to the browsing user model learning module to complete the most relevant web pages, and receive feedback value for browsing behavior analysis and statistics.
  5. 5. 根据权利要求1所述的基于强化学习的网页页面主动式检索系统,其特征在于,该用户信息学习Agent模块用于根据知识库初始生成该用户的基本信息文件,在将来的搜索以及浏览网页的过程中,不断对用户的兴趣进行更新,以确定用户信息模型。 According to claim reinforcing web page based on active learning retrieval system, characterized in that said 1, the user information learning Agent means for generating an initial knowledge of the user's information document, future searches and browsing in accordance with process web pages, and constantly update the user's interest, to determine the user information model.
  6. 6. 根据权利要求5所述的基于强化学习的网页页面主动式检索系统,其特征在于,该用户信息学习Agent模块根据用户反馈值,进一步对用户信息模型进行更新,基于TD算法对用户模型进行迭代计算和更新并判断用户模型是否达到收敛。 6. Active web page based on reinforcement learning retrieval system, wherein as claimed in claim 5, the user information is learned based on user feedback module Agent value, further user information to update the model, the user model is based on TD Algorithm iterative calculation and update user model and determine whether convergence is reached.
  7. 7. 根据权利要求5或6所述的基于强化学习的网页页面主动式检索系统,其特征在于,该用户信息学习Agent模块将运行结果提交给Web过滤Agent,进入下一步学习。 According to claim reinforcing web page based on active learning retrieval system, wherein said 5 or 6, the user information learning Agent module run results to the Web Agent was filtered, the next step to learn.
  8. 8. 根据权利要求5或6所述的基于强化学习的网页页面主动式检索系统,其特征在于,该用户信息学习Agent模块在迭代结果达到收敛时,用户信息模型达到最优,并根据用户信息模型,获取推荐Web页面。 8. Active web page based on reinforcement learning retrieval system, wherein said 5 or claim 6, the user information when the iterative learning module Agent result reaches convergence, optimal user information model, and the user information model, get recommendations Web page.
CN 200810240358 2008-12-17 2008-12-17 Web active retrieval system based on reinforcement learning CN101751437A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200810240358 CN101751437A (en) 2008-12-17 2008-12-17 Web active retrieval system based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200810240358 CN101751437A (en) 2008-12-17 2008-12-17 Web active retrieval system based on reinforcement learning

Publications (1)

Publication Number Publication Date
CN101751437A true CN101751437A (en) 2010-06-23

Family

ID=42478428

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200810240358 CN101751437A (en) 2008-12-17 2008-12-17 Web active retrieval system based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN101751437A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102567408A (en) * 2010-12-31 2012-07-11 阿里巴巴集团控股有限公司 Method and device for recommending search keyword
CN102722543A (en) * 2012-05-24 2012-10-10 潘勇刚 Method for storing files
US8898180B2 (en) 2009-01-12 2014-11-25 Alibaba Group Holding Limited Method and system for querying information
CN105095279A (en) * 2014-05-13 2015-11-25 深圳市腾讯计算机系统有限公司 File recommendation method and apparatus
CN105447376A (en) * 2015-10-30 2016-03-30 广州市汇助惠电子商务有限公司 User management system
CN105631052A (en) * 2016-03-01 2016-06-01 北京百度网讯科技有限公司 Artificial intelligence based retrieval method and artificial intelligence based retrieval device

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9430568B2 (en) 2009-01-12 2016-08-30 Alibaba Group Holding Limited Method and system for querying information
US8898180B2 (en) 2009-01-12 2014-11-25 Alibaba Group Holding Limited Method and system for querying information
CN102567408A (en) * 2010-12-31 2012-07-11 阿里巴巴集团控股有限公司 Method and device for recommending search keyword
CN102567408B (en) 2010-12-31 2014-06-04 阿里巴巴集团控股有限公司 Method and device for recommending search keyword
US8799306B2 (en) 2010-12-31 2014-08-05 Alibaba Group Holding Limited Recommendation of search keywords based on indication of user intention
CN102722543A (en) * 2012-05-24 2012-10-10 潘勇刚 Method for storing files
CN102722543B (en) * 2012-05-24 2014-12-24 潘勇刚 Method for storing files
CN105095279A (en) * 2014-05-13 2015-11-25 深圳市腾讯计算机系统有限公司 File recommendation method and apparatus
CN105447376A (en) * 2015-10-30 2016-03-30 广州市汇助惠电子商务有限公司 User management system
CN105631052A (en) * 2016-03-01 2016-06-01 北京百度网讯科技有限公司 Artificial intelligence based retrieval method and artificial intelligence based retrieval device

Similar Documents

Publication Publication Date Title
Nasraoui et al. A web usage mining framework for mining evolving user profiles in dynamic web sites
Wu et al. Harvesting social knowledge from folksonomies
Pujara et al. Knowledge graph identification
Qiu et al. Convolutional neural tensor network architecture for community-based question answering
CN103544255B (en) Text based on semantic information associated with the network public opinion analysis
US9715542B2 (en) Systems for and methods of finding relevant documents by analyzing tags
Menczer Complementing search engines with online web mining agents
Li Learning to rank for information retrieval and natural language processing
Yu et al. PEBL: positive example based learning for web page classification using SVM
Kuo et al. Building and evaluating a location-based service recommendation system with a preference adjustment mechanism
CN105378764B (en) Interactive concept editor in computer-human&#39;s interactive learning
CN100592293C (en) Knowledge search engine based on intelligent noumenon and implementing method thereof
Yao et al. Unified collaborative and content-based web service recommendation
Castells et al. Self-tuning personalized information retrieval in an ontology-based framework
Stonebraker et al. Data Curation at Scale: The Data Tamer System.
KR20120030389A (en) Merging search results
Zhao et al. STELLAR: spatial-temporal latent ranking for successive point-of-interest recommendation
Yao et al. Recommending web services via combining collaborative filtering with content-based features
Lv et al. Learning to model relatedness for news recommendation
US7844592B2 (en) Ontology-content-based filtering method for personalized newspapers
Qi et al. Mining collective intelligence in diverse groups
JP2008538149A (en) Rating method, the search results organized way, grading system and search results organized system
WO2004013775A2 (en) Data search system and method using mutual subsethood measures
Dou et al. Evaluating the effectiveness of personalized web search
Sun et al. Learning multiple-question decision trees for cold-start recommendation

Legal Events

Date Code Title Description
C06 Publication
C10 Request of examination as to substance
C12 Rejection of an application for a patent