WO2014023121A1 - Method and device for launching individual content - Google Patents

Method and device for launching individual content Download PDF

Info

Publication number
WO2014023121A1
WO2014023121A1 PCT/CN2013/076308 CN2013076308W WO2014023121A1 WO 2014023121 A1 WO2014023121 A1 WO 2014023121A1 CN 2013076308 W CN2013076308 W CN 2013076308W WO 2014023121 A1 WO2014023121 A1 WO 2014023121A1
Authority
WO
WIPO (PCT)
Prior art keywords
web page
user
server
data
terminal
Prior art date
Application number
PCT/CN2013/076308
Other languages
French (fr)
Chinese (zh)
Inventor
游源
钟杰萍
尹攀
杜家春
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2014023121A1 publication Critical patent/WO2014023121A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

Provided is a method for launching individual content, comprising: behavioral targeting(BT) server receiving the user information sent by terminals, analyzing the web pages accessed by the terminals, and getting the types of the web pages(S101); according to the types of web pages and user information, obtaining the characteristic data of users(S102); querying the mapping table of user characteristic data and launching strategy data which is preconfigured by the BT server, and determining the corresponding launching strategy data for the user characteristic data(S103); writing the user characteristic data and corresponding launching strategy data into the cookies of the terminals, so that after receiving the connecting request from the terminals, content providing server can launch the content individually to the terminals in line with the launching strategy data in the cookies of the terminals(S104). Accordingly, present solution also provides an individual launching device, which reduces bandwidth consumption and improves the efficiency of server interaction.

Description

一种^ t个性化内容的方法和装置  Method and device for personalizing content
本申请要求于 2012 年 08 月 10 日提交中国专利局、 申请号为 201210284928.8、 发明名称为 "一种投放个性化内容的方法和装置" 的中国专 利申请的优先权, 其全部内容通过引用结合在本申请中。  This application claims priority to Chinese Patent Application No. 201210284928.8, entitled "A Method and Apparatus for Distributing Personalized Content", filed on August 10, 2012, the entire contents of which are incorporated by reference. In this application.
技术领域 Technical field
本发明实施例涉及通信领域,尤其涉及一种投放个性化内容的方法和装置。 背景技术  Embodiments of the present invention relate to the field of communications, and in particular, to a method and apparatus for delivering personalized content. Background technique
随着社会信息化的步伐加快, 新技术, 新媒体不断涌现, 互联网产品内容 供大于求, 同质化趋势明显。 竟争焦点将转移到服务质量上, 即提供满足客户 需求的服务。个性化服务是哲学领域顾客满意的具体体现,体现了企业以人为 本的经营理念, 是现代企业提高核心竟争力的重要途径。 如何收集、 分析特定 用户行为, 低成本、 高效率的为终端用户提供个性化信息、 内容, 提高服务品 质, 是当前业界研究的热点。  As the pace of social informatization accelerates, new technologies and new media continue to emerge, and the content of Internet products exceeds demand, and the homogenization trend is obvious. The focus of the competition will shift to the quality of service, which is to provide services that meet the needs of customers. Personalized service is a concrete embodiment of customer satisfaction in the philosophical field. It embodies the business philosophy of “people-oriented” and is an important way for modern enterprises to improve their core competitiveness. How to collect and analyze specific user behaviors, providing personalized information and content to end users at low cost and high efficiency, and improving service quality are the hotspots of current industry research.
现有技术中, 用户访问特定内容的内容提供商时, 该提供商将当前用户识 别信息实时发送到分析平台,该分析平台实时从其所属数据库中获取当前用户 特征, 并返回给该特定内容提供商或第三方内容提供商, 以提供个性化内容。 如, 如当用户登录网站时, 该网站需要与用户行为分析系统进行实时通信, 再 由该系统实时查询到相关用户数据信息后,实时将当前用户类型结果返回给当 前网站(或第三方广告提供商 )组织与该用户相匹配的个性化内容投放, 最终 返回给当前用户终端。上述步骤包括多次服务器间网络实时通信,中间环节多, 反应速度与投放可靠性严重依赖用户网络环境, 带宽消耗大,且服务器交互效 率不高。  In the prior art, when a user accesses a content provider of a specific content, the provider sends the current user identification information to the analysis platform in real time, and the analysis platform acquires the current user feature from the database to which it belongs, and returns to the specific content. Business or third-party content providers to provide personalized content. For example, when the user logs in to the website, the website needs to communicate with the user behavior analysis system in real time, and then the system retrieves the relevant user data information in real time, and returns the current user type result to the current website (or third party advertisement provided in real time). The organization organizes the personalized content matching with the user and finally returns it to the current user terminal. The above steps include multiple real-time communication between servers. There are many intermediate links. The response speed and delivery reliability depend heavily on the user network environment. The bandwidth consumption is large and the server interaction efficiency is not high.
发明内容 Summary of the invention
有鉴于此, 本发明实施例提供了一种投放个性化内容的方法和装置,解决 了现有技术中定制个性化内容时带宽消耗大及服务器交互效率不高的问题。  In view of this, the embodiments of the present invention provide a method and an apparatus for delivering personalized content, which solves the problem of large bandwidth consumption and low server interaction efficiency when customizing personalized content in the prior art.
本发明实施例提供一种投放个性化内容的方法, 包括:  The embodiment of the invention provides a method for delivering personalized content, including:
用户行为分析 BT服务器获取终端在访问网页过程中发送的用户信息, 并 对所述终端访问的网页进行分析, 获取所述网页的网页类型; 根据所述网页的 网页类型和所述用户信息, 获取用户特征数据; 查询所述 BT服务器预配置的用户特征数据与投放策略数据的对应关系表, 确定所述用户特征数据对应的投放策略数据; The user behavior analysis BT server obtains the user information sent by the terminal during the process of accessing the webpage, analyzes the webpage accessed by the terminal, and obtains the webpage type of the webpage; and obtains according to the webpage type of the webpage and the user information. User characteristic data; Querying a correspondence table between the user feature data pre-configured by the BT server and the delivery policy data, and determining the delivery policy data corresponding to the user feature data;
将所述用户特征数据和对应的投放策略数据写入所述终端的 cookie, 以便 内容提供服务器收到所述终端下一次访问网页时发送的访问请求后,根据所述 本发明实施例还提供一种投放个性化内容的 BT服务器, 包括:  The user feature data and the corresponding delivery policy data are written into the cookie of the terminal, so that the content providing server receives the access request sent by the terminal when the next time the webpage is accessed by the terminal, according to the embodiment of the present invention. A BT server that delivers personalized content, including:
获取单元, 用于获取终端在访问网页过程中发送的用户信息, 并对所述终 端访问的网页进行分析, 获取所述网页的网页类型;  An obtaining unit, configured to acquire user information sent by the terminal during the process of accessing the webpage, and analyze the webpage accessed by the terminal to obtain a webpage type of the webpage;
所述获取单元,还用于根据所述网页的网页类型和所述用户信息, 获取用 户特征数据;  The obtaining unit is further configured to acquire user feature data according to the webpage type of the webpage and the user information;
确定单元, 用于查询所述 BT服务器预配置的用户特征数据与投放策略数 据的对应关系表, 确定所述用户特征数据对应的投放策略数据;  a determining unit, configured to query a correspondence table between the user feature data pre-configured by the BT server and the delivery policy data, and determine the delivery policy data corresponding to the user feature data;
写入单元,用于将所述用户特征数据和对应的投放策略数据写入所述终端 的 cookie,以便内容提供服务器收到所述终端下一次访问网页时发送的访问请 求后, 根据所述终端的 cookie 中的投放策略数据对所述终端进行个性化内容 投放。  a writing unit, configured to write the user feature data and the corresponding delivery policy data into the cookie of the terminal, so that the content providing server receives the access request sent by the terminal when the next time the webpage is accessed, according to the terminal The delivery policy data in the cookie is personalized to the terminal.
在本发明实施例中, BT服务器收到终端发送的用户信息后, 通过对终端 访问的网页进行网页类型分析 ,并根据用户信息和网页类型获取用户特征数据 及对应的投放策略, 写入终端的 cookie, 以便下一次内容提供服务器接收终端 访问请求时, 通过读取终端的 cookie, 直接对终端进行个性化内容的投放。 降 低了带宽的消耗, 提高了服务器的交互效率。  In the embodiment of the present invention, after receiving the user information sent by the terminal, the BT server performs webpage type analysis on the webpage accessed by the terminal, and obtains the user characteristic data and the corresponding delivery policy according to the user information and the webpage type, and writes the terminal to the terminal. The cookie, so that the next time the content providing server receives the terminal access request, the terminal can directly deliver the personalized content to the terminal by reading the cookie of the terminal. Reduces bandwidth consumption and improves server interaction efficiency.
附图说明 DRAWINGS
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施 例或现有技术描述中所需要使用的附图作一简单地介绍, 显而易见地, 下面描 述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出 创造性劳动的前提下, 还可以根据这些附图获得其他的附图。  In order to more clearly illustrate the embodiments of the present invention or the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below. Obviously, the drawings in the following description It is a certain embodiment of the present invention, and other drawings can be obtained from those skilled in the art without any creative work.
图 1 为本发明实施例的系统架构图;  1 is a system architecture diagram of an embodiment of the present invention;
图 2为本发明实施例一的流程图;  2 is a flowchart of Embodiment 1 of the present invention;
图 3为本发明实施例二的流程图; 图 4为本发明实施例三的架构图。 3 is a flowchart of Embodiment 2 of the present invention; FIG. 4 is a structural diagram of Embodiment 3 of the present invention.
具体实施方式 detailed description
图 1为本系统的架构图。 如图 1所示, 本系统包括终端, 内容提供服务器 及用户行为分析( Behavioral Targeting, BT )服务器。 该系统通过有线或者无 线通信网络相互通信。 这些通信网络包括但是不限于移动通信网络(Mobile Telephone Network ),无线本地局 i或网络( Wireless Local Area Network, LAN ), 蓝牙网络 ( Bluetooth Personal Area Network ), 以太网络( Ethernet LAN ), 令 牌环局 p网 ( a token ring LAN ) , 广 i或网 (a wide area network ) , 互联网 ( the Internet )等等。 终端可以包括但是不限于, 移动设备 ( Mobile Device ) , 可移 动通信的 PDA装置 ( a combination PDA and mobile telephone ) , PDA, 集成信 息装置( Integrated Messaging Device, IMD ),个人计算机( Personal Computer, PC ) 以及笔记本型计算机(Notebook Computer )。 这些终端可以移动, 也可以位于 某个可移动的设备上, 例如但不限于汽车, 卡车, 出租车, 公共汽车, 轮船, 飞机, 自行车, 摩托车, 等等上面。  Figure 1 shows the architecture of the system. As shown in FIG. 1, the system includes a terminal, a content providing server, and a Behavioral Targeting (BT) server. The system communicates with each other via a wired or wireless communication network. These communication networks include, but are not limited to, a Mobile Telephone Network, a Wireless Local Area Network (LAN), a Bluetooth Personal Area Network, an Ethernet LAN, and a Token Ring. A token ring LAN, a wide area network, the Internet, and so on. The terminal may include, but is not limited to, a mobile device, a combination PDA and mobile telephone, a PDA, an integrated information device (IMD), a personal computer (PC). And a notebook computer (Notebook Computer). These terminals can be moved or located on a mobile device such as, but not limited to, a car, a truck, a taxi, a bus, a ship, an airplane, a bicycle, a motorcycle, and the like.
上述通信设备可以基于各种不同的传输技术实现通信的过程,包括但不限 于码分多址 (Code Division Multiple Access, CDMA), 全球移动通讯系统 (Global System for Mobile Communications, GSM), 通用移动通信系统 (Universal Mobile Telecommunications System, UMTS), 时分多址 (Time Division Multiple Access, TDMA), 频分多址 (Frequency Division Multiple Access, FDMA), 传输 控制协议 /因特网互联协议 (Transmission Control Protocol/Internet Protocol, TCP/IP), 短消息服务(Short Messaging Service, SMS), 多媒体信息服务 (Multimedia Messaging Service, MMS), e-mail,即时消息月良务 (Instant Messaging Service, IMS), 蓝牙 (Bluetooth), IEEE 802.11, 等等. 上述的通信设备之间可以 使用不同的媒体资源, 包括但是不限于, 无线电 (radio ),红外线(infrared ) , 激光 ( laser ),电缆 ( cable connection )等等。  The above communication device can implement communication process based on various different transmission technologies, including but not limited to Code Division Multiple Access (CDMA), Global System for Mobile Communications (GSM), Universal Mobile Communication. System (Universal Mobile Telecommunications System, UMTS), Time Division Multiple Access (TDMA), Frequency Division Multiple Access (FDMA), Transmission Control Protocol/Internet Protocol (Transmission Control Protocol/Internet Protocol, TCP/IP), Short Messaging Service (SMS), Multimedia Messaging Service (MMS), e-mail, Instant Messaging Service (IMS), Bluetooth, IEEE 802.11, etc. Different media resources may be used between the above communication devices, including but not limited to, radio, infrared, laser, cable connection, and the like.
内容提供服务器事先向该 BT服务器注册其服务, 并向其所运营的网页内 嵌入指向该 BT服务器的特定脚本, 由此构成本 BT服务器的监控域。 用户通 过终端访问上述监控域内页面时, 该页面将自动加载并执行上述脚本文件。该 脚本文件能够自动收集存储用户在浏览该页面中的信息,并发送给 BT服务器。 BT服务器依据相关模型算法对该用户在其整个监控域内的特定信息进行分析 处理, 形成对应于当前用户的用户特征数据, 并依据该内容提供服务器事先在 本 BT服务器中针对该用户特征数据所设定的个性化内容投放规则形成相应策 略。 上述信息依照约定格式通过上述特定脚本写入到用户本地 cookie 中。 由 此, 当用户再次访问该网站时, 该内容提供服务器可以直接获取并执行保存在 用户本地的 BT分析结果与个性化投放策略数据, 进行精准投放。 即, BT服 务器只需要对用户分析一次, 用户再次访问该网站时, 内容提供服务器直接通 过读取终端本地的 cookie, 投放个性化内容到该终端用户, 而不需要通过 BT 服务器再对点击流数据进行分析, 减少了带宽的消耗,提高了服务器的交互效 率。 The content providing server registers its service with the BT server in advance, and embeds a specific script pointing to the BT server into the web page it operates, thereby forming a monitoring domain of the BT server. When the user accesses the page in the above monitoring domain through the terminal, the page will automatically load and execute the above script file. The script file can automatically collect and store the information that the user is browsing on the page and send it to the BT server. The BT server analyzes and processes the specific information of the user in the entire monitoring domain according to the relevant model algorithm, and forms user characteristic data corresponding to the current user, and according to the content providing server, the server is configured in advance for the user characteristic data in the BT server. The personalized personalized content delivery rules form the corresponding strategy. The above information is written to the user's local cookie in the agreed format through the above specific script. Therefore, when the user visits the website again, the content providing server can directly acquire and execute the BT analysis result and the personalized delivery policy data stored in the user locality, and perform accurate delivery. That is, the BT server only needs to analyze the user once. When the user visits the website again, the content providing server directly feeds the personalized content to the end user by reading the local cookie of the terminal, and does not need to re-click the stream data through the BT server. Analysis is performed to reduce bandwidth consumption and improve server interaction efficiency.
图 2为投放个性化内容的流程图, 如图 2所示, 该图包括:  Figure 2 is a flow chart for delivering personalized content. As shown in Figure 2, the figure includes:
S101、用户行为分析 BT服务器获取终端在访问网页过程中发送的用户信 息, 并对所述终端访问的网页进行分析, 获取所述网页的网页类型;  S101, the user behavior analysis, the BT server acquires the user information sent by the terminal during the process of accessing the webpage, and analyzes the webpage accessed by the terminal, and obtains the webpage type of the webpage;
在终端访问内容提供服务器提供的网页后,终端通过浏览器执行网页上的 脚本程序, 将用户信息发送至 BT服务器。 用户信息包括 HTTP请求信息, 如 所请求的 URL、跳转来源 URL等,被请求页面的信息,如页面标题、关键词、 摘要等, 以及用户行为信息, 如点击、 提交、 输入、 跳转、 刷新等。  After the terminal accesses the webpage provided by the content providing server, the terminal executes the script program on the webpage through the browser, and sends the user information to the BT server. User information includes HTTP request information, such as requested URL, jump source URL, etc., information of the requested page, such as page title, keywords, abstracts, etc., and user behavior information, such as click, submit, input, jump, Refresh and so on.
BT服务器可以对收到的用户信息进行校验和重构。 对于校验成功的用户 信息需要重构为点击流数据,对于校验失败的用户信息则删除, 重新接收新的 用户信息并进行校验。  The BT server can perform checksum reconstruction on the received user information. The user information for successful verification needs to be reconstructed into click stream data, and the user information for verification failure is deleted, and new user information is re-received and verified.
BT服务器获取到用户信息, 还需要对终端访问的网页进行分析, 获取所 述网页的网页类型。 需要说明的是, BT服务器获取用户信息以及对终端访问 的网页进行分析的步骤并没有严格的时间限制, 可以同时进行,也可以先获取 用户信息, 再对访问的网页进行分析。  The BT server obtains the user information, and also needs to analyze the webpage accessed by the terminal to obtain the webpage type of the webpage. It should be noted that the steps for the BT server to obtain user information and analyze the web pages accessed by the terminal are not strictly time-limited, and may be performed simultaneously, or the user information may be acquired first, and then the accessed web pages may be analyzed.
BT服务器对终端访问的网页进行分析, 其具体的分析步骤如下: 设置网页的类型集合及网页的类型对应的频数的集合。  The BT server analyzes the webpage accessed by the terminal, and the specific analysis steps are as follows: Set the type set of the webpage and the set of frequencies corresponding to the type of the webpage.
BT服务器预先将网页内容划分为 N类, 即体育、 财经、 科技类等, 用 {ς 2,...,^}表示。 并设置集合
Figure imgf000006_0001
} , 其中 Μ,.为 类型的频数。 频数 即在终端访问的总的网页数中, ς类型的网页数。需要说明的是, {ς 2,...,^} 和 {M M^.^M 均为 BT服务器事先设定的, 与终端实际访问的网页的网页类 型不同,后者正是需要 BT服务器通过 {ς,ί^...,^}和 {M A^.^Mw}及终端访问 的网页的特征数据计算而得。
The BT server divides the content of the webpage into N categories in advance, that is, sports, finance, science and technology, etc., and is represented by {ς 2 ,...,^}. And set the collection
Figure imgf000006_0001
} , where Μ,. is the frequency of the type. The frequency is the number of web pages of the type in the total number of web pages accessed by the terminal. It should be noted that {ς 2 ,...,^} And {MM^.^M are both set in advance by the BT server, which is different from the webpage type of the webpage actually accessed by the terminal. The latter requires the BT server to pass {ς, ί^..., ^} and {MA^ .^Mw} and the feature data of the web page accessed by the terminal are calculated.
获取所述网页的特征数据,所述特征数据包括所述网页的类型集合中网页 类型对应的关键术语、 字符间距和文本长度。  Obtaining feature data of the webpage, where the feature data includes key terms, character spacings, and text lengths corresponding to webpage types in the type set of the webpage.
根据所述网页的类型对应的频数的集合,计算所述特征数据的概率,选取 所述计算出的概率值中最大的一个或多个概率值,获取所述选取的概率值对应 的网页类型。  Calculating a probability of the feature data according to a set of frequencies corresponding to the type of the webpage, and selecting a maximum one or more probability values of the calculated probability values, and acquiring a webpage type corresponding to the selected probability value.
计算方法如下: BT服务器对每一个分类 ςΕ{ς 2,...,^}, 计算其先验概 率 7^,.) = ^和尸( =^, 其中, Μ = 5 1.为训练集文本总数, 表示输入 The calculation method is as follows: The BT server calculates each prioritized ςΕ{ς 2 ,...,^}, and calculates its prior probability 7^,.) = ^ and corpse (=^, where Μ = 5 1. is the training set Total number of texts, indicating input
Μ Μ 1 数据不属于分类 G的数目。 计算分类 G中特征 取值为 ^的条件概率 p(Fi =xk\cl)= ^画^ := Q+i , 其中, COM„^,c.)表示分类 c.中^取 Μ Μ 1 The data does not belong to the number of classification G. Calculate the conditional probability p(F i =x k \c l )= ^画^ := Q+i of the feature G in the classification G, where COM „^ =3⁄4 , c .) denotes the classification c . take
^ count(Fi = ^, .) + 1 Ff | 值为 的次数, 表示特征 所能取值的个数。 对于信息文本 根据贝叶 斯原理和朴素贝叶斯假设, 该信息文本的分类为 G的概率分别正比于 at
Figure imgf000007_0001
, 其中 (d)表示信息文本 中特征 的取值。 故 BT 月良务器分别求出" ^ Q.n^^^ ^ IQ的值, 并将计算结果表述为集合 V={a^...,aN} ^ BT服务器可将 P按其元素数值排序, 如降序, 得到有序集合
^ count(F i = ^, .) + 1 F f | The number of times the value indicates the number of values that the feature can take. For the information text according to Bayesian principle and Naive Bayes hypothesis, the probability that the information text is classified as G is proportional to a t
Figure imgf000007_0001
, where (d) represents the value of the feature in the message text. Therefore, the BT server determines the value of " ^ Qn^^^ ^ IQ, and expresses the result as a set V={a^..., a N } ^ The BT server can sort P by its element value. , as in descending order, get an ordered collection
Q,即元素的值越大其位置越靠前。 BT服务器则可确定文本 属于该集合 Q中 排在第一位的元素所对应的分类,亦可选取排在前 f个的元素作为其所对应的 分类的概率分布, 其选取规则由 BT服务器预先配置。 例如, CI 、 C2分别表 示体育类和旅游类, 待分析的网页文本 d中包含有 "篮球"、 "足球"、 "旅行" 等特征数据, 则在经过上述运算后, 得到的有序集合 Q 中, 排在第一位的元 素对应于 CI , 排第二位的元素对应于 C2 , 若只选取第一位的元素, 则 BT服 务器确定该网页文本 d的类型为 C1的类型, 即体育类, 若选取第一位和第二 位的元素, 则确定该网页文本 d的类型既是体育类, 又是^游类。 Q, that is, the larger the value of the element, the higher the position. The BT server may determine that the text belongs to the category corresponding to the element ranked first in the set Q, and may also select the elements of the top f as the probability distribution of the corresponding category, and the selection rule is pre-determined by the BT server. Configuration. For example, CI and C2 respectively represent sports and tourism. The text d of the web page to be analyzed contains feature data such as "basketball", "soccer", "travel", etc., and the ordered set Q obtained after the above operation is obtained. In the first place The element corresponds to CI, and the second element corresponds to C2. If only the first element is selected, the BT server determines that the type of the text d of the webpage is of the type C1, that is, the sports class, if the first and the first are selected The two-digit element determines that the type of the text d of the webpage is both a sports class and a ^game class.
S102、 根据所述网页的网页类型和所述用户信息, 获取用户特征数据; 具体获取用户特征数据的方法如下: S102. Acquire user feature data according to the webpage type of the webpage and the user information. The method for specifically acquiring user feature data is as follows:
根据所述点击流数据, 计算所述网页类型对应的元数据的词频 TF、 反文 档频率 IDF及所述 TF与所述 IDF的乘积 TF-IDF;  Calculating a word frequency TF, an inverse document frequency IDF of the metadata corresponding to the webpage type, and a product TF-IDF of the TF and the IDF according to the click stream data;
BT服务器可通过马尔科夫模型公式来计算词频, 公式如下:
Figure imgf000008_0001
The BT server can calculate the word frequency by the Markov model formula. The formula is as follows:
Figure imgf000008_0001
其中, s为设定历史有效步骤数, E aM}为元数据集合中的一个元 数据, 元数据为表示网页类型的关键术语的数据, 如网页类型为体育类, 则元 数据可以为 "体育", 或 "足球", 或 "篮球" 等, 其元数据由 BT服务器预先 设置。 终端访问的网页类型 P为马尔科夫过程中的目标网页, 目标网页即用户 在一个时间段内访问的网页, f(k)为一个用于表述前步推移衰减性衰减因子, 例如本实施例中, f(k) = e— Pkt , 该衰减因子随着时间的增加其值越小, 即权 重越小 ,对于用户行为来说,可以理解为时间越久的用户行为其参考价值越低。 P ( xk = ai|xO = p )表示在一个目标网页中 ai出现的频率, 即词频 ai在目标网 页 p中出现的次数与目标网页 p的所有字词出现总次数的比值, tfai,p表示词 频 (Term Frequency , TF) , 词频表示词条在文档或目标网页中出现的频率。 该 公式即表示一个特定的词条在用户从过去到现在访问的网页上出现的频率,例 如,要计算 "篮球"一词在用户从过去到现在访问的 100个网页中的词频, ΒΤ 服务器根据上述公式, 通过输入的参数: al= "篮球"、 S=100 , 并统计出 P ( xkWhere s is the set of historical effective steps, E a M } is a metadata in the metadata set, and the metadata is data representing key terms of the web page type. If the webpage type is sports, the metadata may be " Sports ", or "soccer", or "basketball", etc., whose metadata is preset by the BT server. The webpage type P accessed by the terminal is the target webpage in the Markov process, and the target webpage is the webpage accessed by the user in a time period, and f(k) is a decaying attenuation factor for expressing the previous step, for example, this embodiment Where f(k) = e-Pkt , the smaller the value of the attenuation factor with time, that is, the smaller the weight, the lower the reference value for the user behavior for the user behavior. P ( xk = ai|xO = p ) indicates the frequency at which ai appears in a target web page, that is, the ratio of the number of occurrences of the word frequency ai in the target web page p to the total number of occurrences of all words of the target web page p, tfai,p indicates Term Frequency (TF), the word frequency indicates how often an entry appears in a document or landing page. The formula indicates the frequency at which a particular entry appears on a web page that the user has visited from the past to the present. For example, to calculate the word frequency of the word "basketball" in 100 web pages accessed by the user from the past to the present, ΒΤ server based The above formula, through the input parameters: al = "basketball", S = 100, and counts P (xk
= ai|xO = p ) 的大小, 最终可通过上述公式计算出 "篮球" 这个词条的词频。 = ai|xO = p ), the word frequency of the word "basketball" can be calculated by the above formula.
在 BT服务器计算出词频后, 还需要将 tfai,p带入公式:  After the BT server calculates the word frequency, you also need to bring tfai,p into the formula:
tfid = if x ki. , 其中, tfl J = tfai pTfid = if x ki. , where tfl J = tfai p .
对于上述公式, 需要先对 TF-IDF ( term frequency-inverse document frequency )进行说明, TF-IDF 是一种用于检索与文本挖掘的常用加权技术, 主要用于评估一个字词对于一个文件集的其中一份文件的重要程度。如果某个 词或短语在一篇文章中出现的频率高, 并且在其他文章中很少出现, 则认为此 词或者短语具有艮好的类别区分能力, 适合用来分类。 For the above formula, you need to first TF-IDF (term frequency-inverse document Frequency ) To illustrate, TF-IDF is a commonly used weighting technique for retrieval and text mining. It is mainly used to evaluate the importance of a word for one of a file set. If a word or phrase appears frequently in an article and rarely appears in other articles, the word or phrase is considered to have a good class distinguishing ability and is suitable for classification.
词频( Term Frequency, TF )指的是某一个给定的词语在该文件或目标网 页中出现的频率。 目标网页在本发明实施例中, 即表示终端访问的网页。 逆向 文件频率( Inverse Document Frequency, IDF )是一个词语普遍重要性的度量。 其原理为: 如果包含某一词条的文档或目标网页越少, 则 IDF越大, 说明该词 条具有很好的类别区分能力。 某一特定词语的 IDF, 可以由总文件数目除以包 含该 得到的商取对数得到:
Figure imgf000009_0001
Term Frequency (TF) refers to the frequency at which a given word appears in the file or landing page. In the embodiment of the present invention, the target webpage represents a webpage accessed by the terminal. Inverse Document Frequency (IDF) is a measure of the universal importance of a word. The principle is: If the document or landing page containing a certain entry has fewer IDF, the entry has a good class distinguishing ability. The IDF of a particular word can be obtained by dividing the total number of files by the logarithm of the resulting quotient:
Figure imgf000009_0001
其中, |D|: 资料库中的文件总数或目标网页总数, I U: ¾ ^ ( 1 ί: 包含词 语 的文件数目或包含词语 的目标网页数目, (即 ^≠ 的文件数目 )如果 该词语不在资料库或目标网页中, 就会导致被除数为零, 因此一般情况下使用 因此, 对于公式1^^^ = tf x ic 来说, 已知某一特定文件内的高词语 频率, 以及该词语在整个文件集合中的低文件频率, 相乘可求出高权重的Where |D|: the total number of files in the database or the total number of landing pages, IU: 3⁄4 ^ ( 1 ί: the number of files containing words or the number of landing pages containing words, (ie the number of files in ^≠) if the word is not In the database or landing page, the divisor is zero, so it is generally used. Therefore, for the formula 1 ^^^ = tf x ic , the high word frequency within a particular file is known, and the word is The low file frequency in the entire file set, multiplied to find high weight
TF-IDF。 TF-IDF即表示过滤掉常见的词语, 保留重要的词语, 其值越高, 该 词语越重要。 TF-IDF. TF-IDF means filtering out common words and retaining important words. The higher the value, the more important the word is.
BT服务器通过上述公式, 计算出 tfai,p和 idf的值, 并将二者相乘, 求出 tfidf的值。  The BT server calculates the values of tfai,p and idf by the above formula, and multiplies the two to find the value of tfidf.
选取值最大的一个或多个 TF-IDF,确定所述一个或多个 TF-IDF值对应的 元数据, 查询所述元数据与用户特征数据对应关系表, 获取所述元数据对应的 用户特征数据。  Selecting one or more TF-IDFs with the largest value, determining the metadata corresponding to the one or more TF-IDF values, querying the correspondence table between the metadata and the user feature data, and acquiring the user corresponding to the metadata Feature data.
例如, BT服务器先获取终端访问的网页的所有类型, 有 "体育类"、 "音 乐类"、 "财经类"、 'ΊΤ类" 这 4个特定类型, 再选取 "体育"、 "音乐"、 "财 经"、 "IT" 4个特定词条, 分别求出其 tfidf 的值, 然后按从大到小排列, BT 服务器选取值最大的一个或排在前面的几个,假设排列顺序为 "体育 "、"音乐 "、 "财经"、 "IT" , 则 BT服务器确定该用户的用户特征为 "体育迷", 或者是用 户特征为 "体育迷" 和 "音乐达人", 具体的确定过程可以通过查询配置元数 据与用户特征的对应关系表获得。用户特征可以是用字符串表示,也可以用其 他计算机常用的数据形式, 本实施例无限制。 For example, the BT server first obtains all types of web pages accessed by the terminal, including four specific types: "sports class", "music class", "financial class", and "skull class", and then select "sports", "music", "Finance", "IT" four specific terms, respectively, find the value of their tfidf, and then sorted from large to small, the BT server selects the largest one or the first few, assuming the order is " Sports, "music", "Financial", "IT", then the BT server determines that the user's user characteristics are "sports fans", or the user characteristics are "sports fans" and "music talents". The specific determination process can be configured by querying metadata and A correspondence table of user characteristics is obtained. The user feature may be represented by a string or a data format commonly used by other computers, and the embodiment is not limited.
S103、查询所述 BT服务器预配置的用户特征数据与投放策略数据的对应 关系表, 确定所述用户特征数据对应的投放策略数据;  S103: Query a correspondence table between user feature data and delivery policy data pre-configured by the BT server, and determine delivery policy data corresponding to the user feature data.
需要说明的是, 该对应关系表是 BT服务器预先配置的。 例如, 某用户的 用户特征为 "普通白领 /潮人 /美食主义者", 对应的投放策略为 "消费电子 /餐 转义字符、 整形数等计算机常用的数据格式表示, 这里用字符串表示, 分别为 " white— collar /fashion— follower/ food— lover" 、 " electronic— consumption/ group_purchasing_of_food"。  It should be noted that the correspondence table is pre-configured by the BT server. For example, a user's user characteristics are "ordinary white-collar/influx/gourmet", and the corresponding delivery strategy is "consumer electronic/meal-escaped characters, integer numbers, and other computer-used data format representations, which are represented by strings. They are "white_ collar /fashion- follower/ food-lover" and "electronic- consumption/ group_purchasing_of_food".
S104、将所述用户特征数据和对应的投放策略数据写入所述终端的 cookie, 以便内容提供服务器收到所述终端下一次访问网页时发送的访问请求后,根据 需要说明的是, 终端下一次访问的网页可以是前面步骤中访问过的网页, 也可以是另外的网页。  S104. Write the user feature data and the corresponding delivery policy data to the cookie of the terminal, so that the content providing server receives the access request sent by the terminal when the next time the webpage is accessed, according to the need, the terminal is The web page accessed at one time can be the web page visited in the previous step, or it can be another web page.
写入的 cookie如表 1所示:  The cookie written is as shown in Table 1:
表 1 变量 1 UserTypes  Table 1 Variables 1 UserTypes
white— collar  White- collar
变量 1值 I fashion— follower/  Variable 1 value I fashion— follower/
food lover  Food lover
变量 2 Strategies  Variable 2 Strategies
electronic— consumpti  Electronic— consumpti
变量 2值 on/  Variable 2 value on/
group_purchasing_of_food  Group_purchasing_of_food
所属域 www. example . com 可选标志 1536 Domain www. example . com Optional sign 1536
过期时间 2011-10-01 00:00:00  Expiration time 2011-10-01 00:00:00
创建时间 2011-08-25 08:02:21  Creation time 2011-08-25 08:02:21
其中, 变量 1和变量 2分别表示用户特征数据及针对该特征制定的相 应的策略,其余的参数,诸如所属域、可选标志、过期时间和创建时间是 cookie 的文件格式的标准参数。 由此, 在上述 Cookie的有效期内, 该网站都可以依 照该 Cookie中的相关信息进行个性化推送。  Among them, variable 1 and variable 2 represent user characteristic data and corresponding policies for the feature, respectively, and the remaining parameters, such as the domain, optional flag, expiration time and creation time, are standard parameters of the file format of the cookie. Thus, during the validity period of the above-mentioned cookie, the website can be personalized and pushed according to the relevant information in the cookie.
写入 cookie后, 终端再次请求访问内容提供服务器提供的网页时, 其内 容提供服务器直接读取终端的 cookie,获得当前用户特征信息与其指定的投放 策略数据内容提供服务器直接执行上述个性化投放策略数据,针对当前用户直 接生成个性化内容跳转链接,或向第三方内容服务器请求个性化内容跳转链接, 终端访问上述链接时,内容提供服务器向用户投放个性化内容,完成精准投放。  After the cookie is written, when the terminal requests to access the webpage provided by the content providing server again, the content providing server directly reads the cookie of the terminal, obtains the current user characteristic information, and directly specifies the delivery policy data content providing server to execute the personalized personalized policy data. A personalized content jump link is generated directly for the current user, or a personalized content jump link is requested from the third-party content server. When the terminal accesses the link, the content providing server delivers the personalized content to the user, and completes the accurate delivery.
在本发明实施例中, BT服务器收到终端发送的用户信息后, 通过对终端 访问的网页进行网页类型分析 ,并根据用户信息和网页类型获取用户特征数据 及对应的投放策略, 写入终端的 cookie, 以便下一次内容提供服务器接收终端 访问请求时, 通过读取终端的 cookie , 直接对终端进行个性化内容的投放。 降 低了带宽的消耗, 提高了服务器的交互效率。  In the embodiment of the present invention, after receiving the user information sent by the terminal, the BT server performs webpage type analysis on the webpage accessed by the terminal, and obtains the user characteristic data and the corresponding delivery policy according to the user information and the webpage type, and writes the terminal to the terminal. The cookie, so that the next time the content providing server receives the terminal access request, the terminal can directly deliver the personalized content to the terminal by reading the cookie of the terminal. Reduces bandwidth consumption and improves server interaction efficiency.
图 3是投放个性化内容的一个方法流程, 如图 3所示, 该方法流程包括: Figure 3 is a method flow for delivering personalized content. As shown in Figure 3, the method flow includes:
S201、 BT服务器与内容提供服务器完成注册过程; S201, the BT server and the content providing server complete the registration process;
内容提供服务器请求 BT服务, 在请求消息中携带其网站基本信息, 如域 名, 基本业务类型等, 例如, 域名为 www.example.com, 基本业务类型为搜索 类型。 在接收到内容提供服务器的请求后, BT服务器与内容提供服务器完成 注册过程。 注册之后, BT服务器在所规定的用户特征集合范围内, 依据内容 提供服务器预先配置的用户特征与个性化投放策略对应关系表,指定若干针对 一个或多个用户特征的个性化投放策略, 并定期将数据库中用户数量、对应的 用户特征与投放策略的统计结果以报表的形式呈现给内容提供服务器。 例如, The content providing server requests the BT service, and carries the basic information of the website, such as the domain name, the basic service type, etc., in the request message, for example, the domain name is www.example.com, and the basic service type is the search type. After receiving the request from the content providing server, the BT server completes the registration process with the content providing server. After the registration, the BT server specifies a plurality of personalized delivery policies for one or more user characteristics according to the pre-configured user characteristics and personalized delivery policy correspondence table of the content providing server within the specified user feature set, and periodically The statistics of the number of users in the database, the corresponding user characteristics, and the delivery policy are presented to the content providing server in the form of a report. E.g,
BT服务器给出的用户特征集合包含有: 普通白领, 潮人, 美食主义者,驴友, 追星族等,内容提供服务器可以在该集合中进行任意组合以形成自定义用户分 组, 如定义分组 1={普通白领 AND 驴友 } (即同时具备 "普通白领" 与 "驴 友" 的用户特征人群), 分组 2={潮人 OR 追星族 } (即具备 "潮人" 与 "追 星族" 的用户特征两者之一的特定人群), 进而可以对分组 1指定投放 "户外 活动安全" 的个性化内容, 对分组 2指定投放 "流行音乐与明星动态" 的个性 化内容。 The user feature set given by the BT server includes: ordinary white-collar workers, hipsters, gourmets, friends, chasing stars, etc., and the content providing server can perform any combination in the set to form a customized user group, such as defining a group 1 = {Ordinary white-collar workers and friends} (that is, both "ordinary white-collar workers" and "驴" "Users' characteristics of the crowd", group 2 = {潮人OR star chasing family} (that is, a specific group of people with "popular people" and "chasing stars" user characteristics), and then can be assigned to group 1" Personalized content for "safe outdoor activities", and the personalized content of "pop music and star dynamics" is assigned to group 2.
S202、 BT服务器收集用户信息;  S202. The BT server collects user information.
用户通过终端访问包含有 BT服务器的特定脚本的网页时, BT服务器开 始自动釆集当前用户信息, 用户信息包括但不限于: HTTP请求信息, 如所请 求的 URL、 跳转来源 URL等, 被请求页面的信息, 如页面标题、 关键词、 摘 要等, 以及用户行为信息, 如点击、 提交、 输入、 跳转、 刷新等。 其釆集过程 可以是: 用户通过终端访问网页时, 终端执行该特定脚本中的命令语句, 并发 送用户信息至远程的 BT服务器。  When the user accesses the webpage containing the specific script of the BT server through the terminal, the BT server starts to automatically collect the current user information, including but not limited to: HTTP request information, such as the requested URL, the jump source URL, etc., is requested. Information about the page, such as page titles, keywords, abstracts, etc., as well as user behavior information such as clicks, submissions, inputs, jumps, refreshes, and more. The collection process may be: When the user accesses the webpage through the terminal, the terminal executes the command statement in the specific script and sends the user information to the remote BT server.
S203、 对用户信息进行校验和重构;  S203. Perform checksum reconstruction on user information.
常见的错误或异常记录包括返回数据乱码、 重复、 异常终止等。 BT服务 器对上述错误及异常数据将予以抛弃。  Common errors or exception records include garbled data, duplicates, abends, and so on. The BT server will discard the above errors and abnormal data.
BT服务器接收上述用户信息,对其进行校验,可以有很多种校验的方式, 本发明实施例对此没有限制。例如, 用户信息保存的格式是由一些特征隔断符 进行隔断,如 "xxxxx ΛΒ yyyyy"的形式,其中 "ΛΒ"为特征隔断符, "x"、 "y" 为原始数据, 其形式可以为字符或其他的在计算机领域中常用的数据形式。若 发现其接收到的格式不是上述的格式,如缺少特征隔断符或特征隔断符位置不 正确, 则 BT服务器判断该数据为异常, 进行异常与错误剔除, 即丟弃接收到 的异常的用户信息的数据, 重新等待并接收下一组用户信息的数据。若用户信 息无异常, 则 BT服务器将该用户信息重构为点击流数据。 重构的意思是将用 户信息的数据重新进行处理及整合。 具体为: BT服务器将验证后的数据转化 为点击流数据, 点击流数据种类包括但不限于: 用户浏览时间段、 浏览时长、 浏览该网站的频率、浏览网站的类型。浏览时间段可以分为早上(5:01— 12:00 ), 下午( 12:01— 6:00 )、 晚上 ( 6:01—22:00 )及深夜 ( 22: 01—5:00 ), 浏览时长 表示原始数据中发起页面请求的时间到页面关闭时间的时长, 包括: 短(小于 10秒;)、 一般( 10— 30秒;)、 较长( 30秒一 100秒;), 长(大于 100秒;)。 需要 说明的是, 所述划分的时间段及时长并没有严格的限制, 其余的划分也属于本 发明保护的范围。 浏览该网站的频率表示单位时间内访问该网站的次数, 浏览 网站的类型可具体分为体育类、 财经类、 搜索类等。 例如, 原始数据中发起页 面请求的时间为 23:09:23,页面关闭时间为 23:11:01,则可将上述数据重构为: 访问时间段: 深夜(22:01-05:00); 访问时长: 较长( 30秒一 100秒); 访问次 数: 1次 /天; 访问网站类型: 体育。 The BT server receives the above-mentioned user information and verifies it. There are many ways to check the BT server. For example, the format in which user information is saved is separated by some feature separators, such as "xxxxx Λ y yyyyy", where " Λ Β" is the feature separator, "x", "y" is the original data, and its form can be For characters or other forms of data commonly used in the computer field. If the format received by the device is not the above format, if the feature break or the feature break is not correct, the BT server determines that the data is abnormal, and performs abnormal and error culling, that is, discards the received abnormal user information. The data, waiting again and receiving data for the next set of user information. If the user information is not abnormal, the BT server reconstructs the user information into click stream data. Refactoring means reprocessing and integrating the data of user information. Specifically, the BT server converts the verified data into click stream data, and the type of the click stream data includes but is not limited to: a user browsing time period, a browsing time, a frequency of browsing the website, and a type of browsing the website. The browsing time period can be divided into morning (5:01 - 12:00), afternoon (12:01 - 6:00), evening (6:01 - 22:00) and late night (22: 01 - 5:00). The browsing duration indicates the length of time from the time the page request is initiated to the page closing time in the original data, including: short (less than 10 seconds;), normal (10-30 seconds;), longer (30 seconds to 100 seconds;), long ( More than 100 seconds;). It should be noted that the time period of the division is not strictly limited, and the remaining divisions also belong to the present. The scope of the invention is protected. The frequency of browsing the website indicates the number of visits to the website in a unit of time. The types of websites that can be browsed can be classified into sports, finance, and search. For example, the time to initiate a page request in the original data is 23:09:23, and the page close time is 23:11:01, then the above data can be reconstructed as: Access time period: Late night (22:01-05:00) Visit duration: Longer (30 seconds - 100 seconds); Visits: 1 time / day; Visit website type: Sports.
S204、 实时或定时对网页进行分析;  S204: analyzing the webpage in real time or at a time;
BT服务器实时或定时对网页的内容进行分析, 将网页内容分析结果存入 数据库, 其结果即为该网页的类型, 如体育、 财经、 科学等类型。  The BT server analyzes the content of the webpage in real time or periodically, and stores the webpage content analysis result in the database. The result is the type of the webpage, such as sports, finance, science, and the like.
分析可以釆取常用的训练算法如支持向量机(SVM),决策树、神经网络, 以及朴素贝叶斯(NaiveBayes)等。首先对网站内容进行分类,如体育、财经、 科技、 教育、 军事等, 进而利用上述算法对上述网站训练得到一个分类器, 同 时抓取特定网页, 将上述网站的内容文本投入上述分类器进行分析, 并给出相 应内容分类。以朴素贝叶斯方法为例, BT服务器对每一个分类 ς e {CX,C2,...,CN} , 计算其先验概率 ρ(ς)=^^。ρ( = ,其中, Μ=5 τι .为训练集文本总数, Analysis can draw on commonly used training algorithms such as support vector machines (SVM), decision trees, neural networks, and Naive Bayes. First, the content of the website is classified, such as sports, finance, science and technology, education, military, etc., and then the above algorithm is used to train a classifier for the above website, and at the same time, a specific webpage is captured, and the content text of the website is put into the classifier for analysis. And give the corresponding content classification. Taking the naive Bayesian method as an example, the BT server calculates the prior probability ρ(ς)=^^ for each class ς e {C X , C 2 ,..., C N }. ρ( = , where Μ=5 τι . is the total number of training set texts,
Μ Μ 1 ^表示输入数据不属于分类 G的数目。 计算分类 G中特征 取值为 ^的条件 概率 p( ΐς)= m count(F' =x c')+1 ,其中, COM«t =¾,Q表示分类 c'中^ Μ Μ 1 ^ indicates the number of input data that does not belong to the classification G. Calculate the conditional probability p( ΐς) = m count(F ' =xc ') +1 of the feature value in the classification G, where COM «t = 3⁄4 , Q denotes the classification c' ^
^ count(Fi = ^, .) + 1 Ff | 取值为 的次数, 1 1表示特征 所能取值的个数。 对于信息文本 根据贝 叶斯原理和朴素贝叶斯假设, 该信息文本的分类为 G的概率分别正比于 at = P(Ct ) · Π Ρ( = ( ) I G ) , 其中 d)表示信息文本 中特征 Fi的取值。 故 BT 服务器分别求出" ,.=^0;]^^= ( ^.)的值, 并将计算结果表述为集合 P BT服务器可将 P按其元素数值排序, 如降序, 得到有序集合 ^ count(F i = ^, .) + 1 F f | The number of times the value is 1, 1 1 indicates the number of values that the feature can take. For the information text according to the Bayesian principle and the naive Bayesian hypothesis, the probability that the information text is classified as G is proportional to a t = P(C t ) · Π Ρ ( = ( ) IG ) , where d) represents information The value of the feature Fi in the text. Therefore, the BT server finds the value of " , .=^0;]^^= ( ^.), and expresses the result as a set P BT server can sort P by its element value, such as descending order, to get an ordered set.
Q,即元素的值越大其位置越靠前。 BT服务器则可确定文本 属于该集合 Q中 排在第一位的元素所对应的分类,亦可选取排在前 f个的元素作为其所对应的 分类的概率分布, 其选取规则由 BT服务器预先配置。 例如, CI 、 C2分别表 示体育类和旅游类, 待分析的网页文本 d中包含有 "篮球"、 "足球"、 "旅行" 等特征数据, 则在经过上述运算后, 得到的有序集合 Q 中, 排在第一位的元 素对应于 C1 , 排第二位的元素对应于 C2 , 若只选取第一位的元素, 则 BT服 务器确定该网页文本 d的类型为 C1的类型, 即体育类, 若选取第一位和第二 位的元素, 则确定该网页文本 d的类型既是体育类, 又是^游类。 S205、 获取用户的特征数据; Q, that is, the larger the value of the element, the higher the position. The BT server can determine that the text belongs to the set Q The classification corresponding to the element ranked first may also select the elements ranked in the first f as the probability distribution of the corresponding classification, and the selection rules are pre-configured by the BT server. For example, CI and C2 respectively represent sports and tourism. The text d of the web page to be analyzed contains feature data such as "basketball", "soccer", "travel", etc., and the ordered set Q obtained after the above operation is obtained. In the first place, the element corresponds to C1, and the second element corresponds to C2. If only the first element is selected, the BT server determines that the type of the text d of the webpage is of the type C1, that is, the sports class. If the first and second elements are selected, it is determined that the type of the text d of the webpage is both a sports class and a ^game class. S205. Acquire feature data of the user.
在特定条件下, 如某一特定用户信息量已达到预设值, 或预定的 cookie 更新周期达到等, BT服务器提取该用户在数据库中的点击流数据与访问的网 页类型, 并进行建模分析, 得到该用户的特征数据, 例如, 用户 1既是 "普通 白领", 又是 "驴友"。  Under certain conditions, if the amount of information of a particular user has reached the preset value, or the predetermined cookie update period is reached, the BT server extracts the click stream data of the user in the database and the type of the visited webpage, and performs modeling analysis. Obtaining the characteristic data of the user, for example, the user 1 is both "ordinary white-collar" and "friend".
对于获取到的用户行为,可以釆用马尔科夫模型或概率模型对先后行为顺 序的相互关系进行拟合分析, 并附加衰减因子用于表述前步推移衰减性, 即时 间越久的记录的权重越小。或者釆用贝叶斯估计对于用户点击记录类型建模等。  For the obtained user behavior, the Markov model or the probability model can be used to fit and analyze the relationship between the sequential behavior sequences, and an attenuation factor is added to express the decay of the previous step, that is, the longer the weight of the record is, the longer the weight is. small. Or use Bayesian estimation to model the user's click record type.
以马尔科夫模型为例:  Take the Markov model as an example:
对于马尔科夫模型来说, 其主要思想是从用户过去对网页的操作行为中, 推测用户现在对网页的操作行为, 即过去的行为与现在的行为的关联度。其公 式如下:  For the Markov model, the main idea is to estimate the user's current behavior of the web page from the user's past behaviors on the web page, that is, the relevance of the past behavior to the current behavior. Its formula is as follows:
s  s
tfai,P = ^ f(k)P(xk = ai |x0 = p) Tf ai , P = ^ f(k)P(x k = ai |x 0 = p)
k=i  k=i
其中, s为设定历史有效步骤数, ai e {a^ .^ aM}为元数据集合中的一个元 数据, 元数据为表示网页类型的关键术语的数据。 终端访问的网页类型 P为马 尔科夫过程中的目标网页, 目标网页即用户在一个时间段内访问的网页, f(k) 为一个用于表述前步推移衰减性衰减因子, 例如 f(k) = e-Pkt, 该衰减因子随 着时间的增加其值越小, 即权重越小, 对于用户行为来说, 可以理解为时间越 久的用户行为其参考价值越低。 P ( xk = ai|xO = p )表示在一个目标网页中 ai 出现的频率, 即词频 ai在目标网页 p中出现的次数与目标网页 p的所有字词 出现总次数的比值, tfai,p表示词频 (Term Frequency, TF) , 词频表示词条在文 档或目标网页中出现的频率。该公式即表示一个特定的词条在用户从过去到现 在访问的网页上出现的频率, 例如, 要计算 "篮球"一词在用户从过去到现在 访问的 100个网页中的词频, ΒΤ服务器根据上述公式,通过输入的参数: al=Where s is the set history effective number of steps, ai e {a^ .^ aM} is a metadata in the metadata set, and the metadata is data representing key terms of the web page type. The webpage type P accessed by the terminal is the target webpage in the Markov process, the target webpage is the webpage accessed by the user in a time period, and f(k) is a decaying attenuation factor for expressing the previous step, for example f(k) ) = e-Pkt, the smaller the value of the attenuation factor over time, that is, the smaller the weight, for the user behavior, it can be understood as the time The lower the user's behavior, the lower the reference value. P ( xk = ai|xO = p ) indicates the frequency at which ai appears in a target web page, that is, the ratio of the number of occurrences of the word frequency ai in the target web page p to the total number of occurrences of all words of the target web page p, tfai,p indicates Term Frequency (TF), the word frequency indicates how often the term appears in the document or landing page. The formula indicates the frequency at which a particular entry appears on a web page that the user has visited from the past to the present. For example, to calculate the word frequency of the word "basketball" in 100 web pages accessed by the user from the past to the present, the server is based on The above formula, through the input parameters: al=
"篮球"、 S=100,并统计出 P ( xk = ai|xO = p )的大小,最终可计算出 "篮球" 这个词条的词频。 "Basketball", S=100, and count the size of P ( xk = ai|xO = p ), and finally calculate the word frequency of the word "basketball".
在 BT服务器计算出词频后, 还需要将将 tfai,p带入公式:  After the BT server calculates the word frequency, you also need to bring tfai,p into the formula:
tfid = x id , 其中, tfij = tfki,p。 t fid = x id , where tfij = tfki,p.
对于上述公式, 需要先对 TF-IDF ( term frequency-inverse document frequency )进行说明, TF-IDF 是一种用于检索与文本挖掘的常用加权技术, 主要用于评估一个字词对于一个文件集的其中一份文件的重要程度。如果某个 词或短语在一篇文章中出现的频率高, 并且在其他文章中很少出现, 则认为此 词或者短语具有艮好的类别区分能力, 适合用来分类。  For the above formula, TF-IDF (term frequency-inverse document frequency) needs to be described first. TF-IDF is a commonly used weighting technique for retrieval and text mining. It is mainly used to evaluate a word for a file set. The importance of one of the documents. If a word or phrase appears frequently in an article and rarely appears in other articles, the word or phrase is considered to have a good class distinguishing ability and is suitable for classification.
词频(Term Frequency, TF )指的是某一个给定的词语在该文件中出现的 频率。 逆向文件频率 ( Inverse Document Frequency, IDF )是一个词语普遍重 要性的度量。 其原理为: 如果包含某一词条的文档或目标网页越少, 则 IDF 越大, 说明该词条具有很好的类别区分能力。 某一特定词语的 IDF, 可以由总 文件数目除以包含该词语的文件的数目, 再将得到的商取对数得到: id = log  The term frequency (TF) refers to the frequency at which a given word appears in the file. Inverse Document Frequency (IDF) is a measure of the general importance of a word. The principle is: If the document or landing page containing a certain term has fewer, the larger the IDF, the better the classification ability of the term. The IDF of a particular word can be obtained by dividing the total number of files by the number of files containing the word, and then taking the logarithm of the resulting quotient: id = log
{i: ¾ G dy }  {i: 3⁄4 G dy }
其中, |D|: 资料库中的文件总数或目标网页总数, Ιϋ 包含词 语 的文件数目或包含词语 的目标网页数目, (即 ¾ Γ≠ :ί)的文件数目 )如果 该词语不在资料库或目标网页中, 就会导致被除数为零, 因此一般情况下使用Where |D|: the total number of files in the database or the total number of landing pages, Ιϋ the number of files containing words or the number of landing pages containing words, (ie 3⁄4 Γ≠ : ί) the number of files if the word is not in the database or In the landing page, it will cause the dividend to be zero, so it is generally used.
1 + |{i: k e f¾}|。 1 + |{i: k e f3⁄4}|.
因此, 对于公式 = tfy X k 来说, 已知某一特定文件内的高词语 频率, 以及该词语在整个文件集合中的低文件频率, 相乘可求出高权重的Therefore, for the formula = tf y X k , the high word frequency within a particular file is known, and the low file frequency of the word in the entire file set is multiplied to find a high weight.
TF-IDF。 TF-IDF即表示过滤掉常见的词语, 保留重要的词语, 其值越高, 该 词语越重要。 TF-IDF. TF-IDF means filtering out common words, retaining important words, the higher the value, the The more important the words are.
BT服务器通过上述公式, 计算出 tfai,p和 idf的值, 并将二者相乘, 求出 tfidf的值。 例如, BT服务器对某一用户进行建模分析, 步骤如下: 先获取用 户浏览的网页的所有类型, 有 "体育类"、 "音乐类"、 "财经类"、 "IT类" 这 4 个特定类型, 再选取 "体育"、 "音乐"、 "财经"、 "IT" 4个特定词条, 分别求 出其 tfidf的值, 然后按从大到小排列, BT服务器选取值最大的一个或排在前 面的几个, 假设排列顺序为 "体育"、 "音乐 "、 "财经"、 "IT" , 则 BT服务器 确定该用户的用户特征为 "体育迷", 或者是用户特征为 "体育迷" 和 "音乐 达人", 具体的确定过程可以通过查询配置词条与用户特征的对照表获得。 用 户特征可以是用字符串表示,也可以用其他计算机常用的数据形式, 本实施例 无限制。  The BT server calculates the values of tfai,p and idf by the above formula, and multiplies the two to find the value of tfidf. For example, the BT server performs modeling and analysis on a certain user. The steps are as follows: Firstly, all types of web pages browsed by the user are obtained, and there are four specific categories of "sports class", "music class", "financial class", and "IT class". Type, then select "sports", "music", "finance", "IT" four specific terms, find the value of tfidf, and then arrange from big to small, BT server selects the largest one or In the first few, assuming that the order is "sports", "music", "finance", "IT", then the BT server determines that the user's user characteristics are "sports fans", or the user characteristics are "sports fans". "and "music talent", the specific determination process can be obtained by querying the configuration table of the terms and user characteristics. The user feature may be represented by a string or a data format commonly used by other computers, and the embodiment is not limited.
S206、 将用户特征数据及相应投放策略数据写入 cookie中;  S206. Write user characteristic data and corresponding delivery policy data into the cookie.
BT服务器将得到的上述用户特征与提供该网页的内容提供服务器设定的 个性化投放策略进行匹配, 获取针对该用户特征的个性化投放策略数据, 并转 化为指定的 cookie 的格式, 并将其中所包含的用户特征数据与个性化投放策 略数据写入终端的 cookie中, 并设定该 cookie的生存周期, 即用户特征数据 与相应投放策略数据的有效期限。  The BT server matches the obtained user feature with the personalized delivery policy set by the content providing server that provides the webpage, obtains personalized delivery policy data for the user feature, and converts the format into a specified cookie format, and The included user characteristic data and the personalized delivery policy data are written into the cookie of the terminal, and the life cycle of the cookie, that is, the user characteristic data and the validity period of the corresponding delivery policy data, is set.
本发明实施例对写入 cookie 的方式没有限制, 以在当前网站域名下形成 单独的一条 cookie记录的写入方式为例, 该用户的用户特征为 "普通白领 /潮 人 /美食主义者", 对应的投放策略为 "消费电子 /餐饮团购", 则写入的用户特 征数据和对应的投放策略数据可以用字符串、转义字符、整形数等计算机常用 的数据格式表示, 这里用字符串表示, 分别为 "white— collar /fashion— follower/ food— lover"、 " electronic— consumption/ group_purchasing_of_food " ^口网 占 www.example.com在 BT服务器监控域内, 则写入结果如表 1所示。  The embodiment of the present invention has no limitation on the manner of writing a cookie, and the writing method of forming a single cookie record under the current website domain name is taken as an example, and the user characteristic of the user is "ordinary white-collar/influx/gourmet". The corresponding delivery strategy is "consumer electronics/catering group purchase", and the written user characteristic data and corresponding delivery strategy data can be represented by a commonly used data format of a string, an escape character, an integer number, etc., where a string is used. , respectively, "white_collar/fashion-follower-following-food-lover", "electronic-consumption/group_purchasing_of_food" ^ The network accounts for www.example.com in the BT server monitoring domain, and the results are shown in Table 1.
S207、 终端对内容提供服务器发起访问请求;  S207. The terminal initiates an access request to the content providing server.
需要说明的是, 在 BT服务器将用户特征数据和相应投放策略数据写入 cookie后, 用户通过终端再一次访问该网页。  It should be noted that after the BT server writes the user feature data and the corresponding delivery policy data into the cookie, the user accesses the webpage again through the terminal.
S208、 内容提供服务器获取当前用户特征数据及相应投放策略数据; 内容提供服务器通过读取终端的 Cookie信息, 获得当前用户特征信息与 其指定的投放策略数据。 S208. The content providing server acquires current user feature data and corresponding delivery policy data. The content providing server obtains current user feature information by reading the cookie information of the terminal. Its specified delivery strategy data.
S209、 内容提供服务器向用户投放个性化内容。  S209. The content providing server delivers the personalized content to the user.
内容提供服务器直接执行上述个性化投放策略数据,针对当前用户直接生 成个性化内容跳转链接, 或向第三方内容服务器请求个性化内容跳转链接, 终 端访问上述链接时, 内容提供服务器向用户投放个性化内容, 完成精准投放。  The content providing server directly executes the personalized delivery policy data, generates a personalized content jump link for the current user, or requests a personalized content jump link to the third-party content server. When the terminal accesses the link, the content providing server delivers the content to the user. Personalize the content and complete the precise delivery.
在本发明实施例中, BT服务器收到终端发送的用户信息后, 通过对终端 访问的网页进行网页类型分析 ,并根据用户信息和网页类型获取用户特征数据 及对应的投放策略, 写入终端的 cookie, 以便下一次内容提供服务器接收终端 访问请求时, 通过读取终端的 cookie , 直接对终端进行个性化内容的投放。 降 低了带宽的消耗, 提高了服务器的交互效率。  In the embodiment of the present invention, after receiving the user information sent by the terminal, the BT server performs webpage type analysis on the webpage accessed by the terminal, and obtains the user characteristic data and the corresponding delivery policy according to the user information and the webpage type, and writes the terminal to the terminal. The cookie, so that the next time the content providing server receives the terminal access request, the terminal can directly deliver the personalized content to the terminal by reading the cookie of the terminal. Reduces bandwidth consumption and improves server interaction efficiency.
图 4是 BT服务器的装置结构图, 如图 4所示, 包括:  FIG. 4 is a structural diagram of a device of the BT server, as shown in FIG. 4, including:
获取单元 301 , 用于获取终端在访问网页过程中发送的用户信息, 并对所 述终端访问的网页进行分析, 获取所述网页的网页类型; 在终端访问内容提供 服务器提供的网页后, 终端通过浏览器执行网页上的脚本程序, 将用户信息发 送至获取单元 301。 用户信息包括 HTTP请求信息, 如所请求的 URL、 跳转来 源 URL等, 被请求页面的信息, 如页面标题、 关键词、 摘要等, 以及用户行 为信息, 如点击、 提交、 输入、 跳转、 刷新等。  The obtaining unit 301 is configured to acquire user information sent by the terminal during the process of accessing the webpage, and analyze the webpage accessed by the terminal, and obtain the webpage type of the webpage; after the terminal accesses the webpage provided by the content providing server, the terminal passes The browser executes the script program on the web page and transmits the user information to the obtaining unit 301. User information includes HTTP request information, such as requested URL, jump source URL, etc., information of the requested page, such as page title, keyword, abstract, etc., and user behavior information, such as click, submit, input, jump, Refresh and so on.
获取单元 301可以对收到的用户信息进行校验和重构。对于校验成功的用 户信息需要重构为点击流数据,对于校验失败的用户信息则删除, 重新接收新 的用户信息并进行校验。  The obtaining unit 301 can perform checksum reconstruction on the received user information. User information for successful verification needs to be reconstructed into click stream data, and the user information for verification failure is deleted, and new user information is re-received and verified.
获取单元 301获取到用户信息,还需要对终端访问的网页进行分析, 获取 所述网页的网页类型。 需要说明的是, 获取单元 301获取用户信息以及对终端 访问的网页进行分析的步骤并没有严格的时间限制, 可以同时进行,也可以先 获取用户信息, 再对访问的网页进行分析。 BT服务器通过设置单元预先将网 页内容划分为 N类, 即体育、 财经、 科技类等, 用 {G'C^'Cw}表示。 并设置 集合 {Μι'Μ2'···'Μ^ , 其中 为 G类型的频数。 频数即在终端访问的总的网页 数中, C,类型的网页数。 需要说明的是, {^ ' '^}和{^'^' '^}均为 ΒΤ 服务器事先设定的, 与终端实际访问的网页的网页类型不同, 后者正是需要 ΒΤ服务器通过 {Cl'C2-'Cw ^Ml'M2-'Mw}及终端访问的网页的特征数据计算 而得。 The obtaining unit 301 obtains the user information, and also needs to analyze the webpage accessed by the terminal to obtain the webpage type of the webpage. It should be noted that the step of obtaining the user information and analyzing the webpage accessed by the terminal by the obtaining unit 301 is not strictly limited, and may be performed simultaneously, or the user information may be acquired first, and then the accessed webpage may be analyzed. The BT server divides the content of the webpage into N categories in advance through the setting unit, that is, sports, finance, science and technology, etc., and is represented by {G'C^'Cw}. And set the set {Μι 'Μ2' ··· 'Μ ^, where G is a type of frequency. The frequency is the total number of web pages accessed by the terminal, C , the number of web pages of the type. Incidentally, {^ '^} and {^' ^ '^} ΒΤ server are set in advance, and the type of web page different from actual access terminal, which is needed by server ΒΤ {CI ' C2 -' Cw ^ Ml ' M2 -' M w} and the feature data calculation of the web page accessed by the terminal And got it.
获取单元 301获取所述网页的特征数据,所述特征数据包括所述网页的类 型集合中网页类型对应的关键术语、 字符间距和文本长度。  The obtaining unit 301 acquires feature data of the webpage, where the feature data includes a key term, a character pitch, and a text length corresponding to the webpage type in the type set of the webpage.
计算单元根据所述网页的类型对应的频数的集合,计算所述特征数据的概 率,选取所述计算出的概率值中最大的一个或多个概率值, 获取所述选取的概 率值对应的网页类型。  The calculating unit calculates a probability of the feature data according to a set of frequencies corresponding to the type of the webpage, and selects one or more probability values of the calculated probability values to obtain a webpage corresponding to the selected probability value. Types of.
计算方法如下: BT服务器的计算单元对每一个分类
Figure imgf000018_0001
计 算其先验概率^^^和^ 二^,其中, Μ = 5 1.为训练集文本总数,
The calculation method is as follows: BT server's calculation unit for each classification
Figure imgf000018_0001
Calculate its prior probability ^^^ and ^^^, where Μ = 5 1. is the total number of texts in the training set.
M M 1 表示输入数据不属于分类 G的数目。计算分类 G中特征 取值为 ^的条件概率 P(F1 =xk\C1)= m COUnt(F' =X^C') + l , 其中, COMWt ( =¾,Q表示分类 C'中^取 MM 1 indicates the number of input data that does not belong to the classification G. Characterized in classification calculated value of G ^ conditional probability P (F 1 = x k \ C 1) = m COUnt (F '= X ^ C') + l, wherein, COMW t (= ¾, Q represents a Class C '中中取
^ count(Fi = ^, .) + 1 Ff | 值为 的次数, 表示特征 所能取值的个数。 对于信息文本 根据贝叶 斯原理和朴素贝叶斯假设, 该信息文本的分类为 G的概率分别正比于 at = P(Ct ) · Π Ρ( = ( ) I G ) , 其中 d)表示信息文本 中特征 Fi的取值。 故 BT 服务器的计算单元分别求出 =ρ(ς)·ΠΡ( = (^)ΙΦ的值, 并将计算结果表 述为集合?^ ,^, 计算单元可将 Ρ按其元素数值排序, 如降序, 得到 有序集合 Q, 即元素的值越大其位置越靠前。 确定单元则可确定文本 属于该 集合 Q中排在第一位的元素所对应的分类, 亦可选取排在前 f个的元素作为其 所对应的分类的概率分布, 其选取规则由确定单元预先配置。 例如, CI 、 C2 分别表示体育类和旅游类, 待分析的网页文本 d 中包含有 "篮球"、 "足球"、 "旅行" 等特征数据, 则在经过上述运算后, 得到的有序集合 Q 中, 排在第 一位的元素对应于 C1, 排第二位的元素对应于 C2, 若只选取第一位的元素, 则 BT服务器的确定单元确定该网页文本 d的类型为 C1的类型, 即体育类, 若选取第一位和第二位的元素, 则确定该网页文本 d的类型既是体育类, 又是 旅游类。 所述获取单元,还用于根据所述网页的网页类型和所述用户信息, 获取用 户特征数据; ^ count(F i = ^, .) + 1 F f | The number of times the value indicates the number of values that the feature can take. For the information text according to the Bayesian principle and the naive Bayesian hypothesis, the probability that the information text is classified as G is proportional to a t = P(C t ) · Π Ρ ( = ( ) IG ) , where d) represents information The value of the feature Fi in the text. Therefore, the calculation unit of the BT server obtains the value of =ρ(ς)·ΠΡ( = (^)ΙΦ, and expresses the calculation result as a set?^,^, and the calculation unit can sort the Ρ by its element value, such as descending order. , the ordered set Q is obtained, that is, the larger the value of the element is, the higher the position is. The determining unit can determine that the text belongs to the category corresponding to the element ranked first in the set Q, and can also select the top f The element is used as the probability distribution of its corresponding classification, and its selection rules are pre-configured by the determining unit. For example, CI and C2 respectively represent sports and tourism, and the text d of the web page to be analyzed contains "basketball" and "soccer". Characteristic data such as "travel", in the ordered set Q obtained after the above operation, the element ranked first corresponds to C1, and the element ranked second corresponds to C2, if only the first bit is selected Elements, Then, the determining unit of the BT server determines that the type of the text d of the webpage is C1, that is, the sports class. If the first and second elements are selected, it is determined that the type of the text d of the webpage is both a sports class and a tourism class. . The obtaining unit is further configured to acquire user feature data according to the webpage type of the webpage and the user information;
获取单元 301的获取步骤如下:  The obtaining step of the obtaining unit 301 is as follows:
计算单元根据所述点击流数据, 计算所述网页类型对应的元数据的词频 TF、 反文档频率 IDF及所述 TF与所述 IDF的乘积 TF-IDF;  Calculating, according to the click stream data, a word frequency TF, an inverse document frequency IDF of the metadata corresponding to the webpage type, and a product TF-IDF of the TF and the IDF;
BT服务器的计算单元可通过马尔科夫模型公式来计算词频, 公式如下: s  The calculation unit of the BT server can calculate the word frequency by the Markov model formula, and the formula is as follows: s
tfai,P = ^ f(k) P(xk = ai |x0 = p) Tf ai , P = ^ f(k) P(x k = ai |x 0 = p)
k=i  k=i
其中, s为设定历史有效步骤数, ai e {a^ .^ aM}为元数据集合中的一个元 数据, 元数据为表示网页类型的关键术语的数据, 如网页类型为体育类, 则元 数据可以为 "体育", 或 "足球", 或 "篮球" 等, 其元数据由 BT服务器的计 算单元预先设置。 终端访问的网页类型 P为马尔科夫过程中的目标网页, 目标 网页即用户在一个时间段内访问的网页, f(k)为一个用于表述前步推移衰减性 衰减因子, 例如本实施例中, f(k) = e-Pkt , 该衰减因子随着时间的增加其值 越小, 即权重越小, 对于用户行为来说, 可以理解为时间越久的用户行为其参 考价值越低。 P ( xk = ai|xO = p )表示在一个目标网页中 ai出现的频率, 即词 tfai, 表示词频 (Term Frequency, TF) , 词频表示词条在文档或目标网页中出现 的频率。该公式即表示一个特定的词条在用户从过去到现在访问的网页上出现 的频率, 例如, 要计算 "篮球"一词在用户从过去到现在访问的 100个网页中 的词频, BT服务器的计算单元根据上述公式,通过输入的参数: al= "篮球"、 S=100 , 并统计出 P ( xk = ai|x0 = p )的大小, 最终可通过上述公式计算出 "篮 球" 这个词条的词频。 Where s is the set of historical effective steps, ai e {a^ .^ aM} is a metadata in the metadata set, and the metadata is data representing key terms of the webpage type, such as the webpage type is sports, The metadata can be "sports", or "soccer", or "basketball", etc., and its metadata is preset by the computing unit of the BT server. The webpage type P accessed by the terminal is the target webpage in the Markov process, and the target webpage is the webpage accessed by the user in a time period, and f(k) is a decaying attenuation factor for expressing the previous step, for example, this embodiment Where f(k) = e-Pkt , the smaller the value of the attenuation factor over time, that is, the smaller the weight, the lower the user's behavior, the lower the reference value of the user behavior. P ( xk = ai|xO = p ) indicates the frequency at which ai appears in a target web page, that is, the word tfai, which indicates the word frequency (TF), and the word frequency indicates how often the entry appears in the document or the target web page. The formula indicates the frequency at which a particular entry appears on a web page that the user has visited from the past to the present. For example, to calculate the word frequency of the word "basketball" in 100 web pages accessed by the user from the past to the present, the BT server According to the above formula, the calculation unit passes the input parameters: al= "basketball", S=100, and counts the size of P (xk = ai|x0 = p). Finally, the term "basketball" can be calculated by the above formula. Word frequency.
在 BT服务器的计算单元计算出词频后, 还需要将 tfai,p带入公式: tfidf = t X id: , 其中, tfi = tfei,p。 After the BT server's calculation unit calculates the word frequency, you also need to bring tfai,p into the formula: Tfidf = t X id: , where tfi = tfei , p.
对于上述公式, 需要先对 TF-IDF ( term frequency-inverse document frequency )进行说明, TF-IDF 是一种用于检索与文本挖掘的常用加权技术, 主要用于评估一个字词对于一个文件集的其中一份文件的重要程度。如果某个 词或短语在一篇文章中出现的频率高, 并且在其他文章中很少出现, 则认为此 词或者短语具有艮好的类别区分能力, 适合用来分类。  For the above formula, TF-IDF (term frequency-inverse document frequency) needs to be described first. TF-IDF is a commonly used weighting technique for retrieval and text mining. It is mainly used to evaluate a word for a file set. The importance of one of the documents. If a word or phrase appears frequently in an article and rarely appears in other articles, the word or phrase is considered to have a good class distinguishing ability and is suitable for classification.
词频( Term Frequency, TF )指的是某一个给定的词语在该文件或目标网 页中出现的频率。 目标网页在本发明实施例中, 即表示终端访问的网页。 逆向 文件频率( Inverse Document Frequency, IDF )是一个词语普遍重要性的度量。 其原理为: 如果包含某一词条的文档或目标网页越少, 则 IDF越大, 说明该词 条具有很好的类别区分能力。 某一特定词语的 IDF, 可以由总文件数目除以包 含该词语的文 得到的商取对数得到:
Figure imgf000020_0001
Term Frequency (TF) refers to the frequency at which a given word appears in the file or landing page. In the embodiment of the present invention, the target webpage represents a webpage accessed by the terminal. Inverse Document Frequency (IDF) is a measure of the universal importance of a word. The principle is: If the document or landing page containing a certain entry has fewer IDF, the entry has a good class distinguishing ability. The IDF of a particular word can be obtained by dividing the total number of documents by the logarithm of the quotient obtained from the article containing the word:
Figure imgf000020_0001
其中, pi: 资料库中的文件总数或目标网页总数, Ιϋ : ί¾ Η: 包含词 语 的文件数目或包含词语 的目标网页数目, (即 s '≠ 的文件数目 )如果 该词语不在资料库或目标网页中, 就会导致被除数为零, 因此一般情况下使用 因此, 对于公式1^^^ = tf x ic 来说, 已知某一特定文件内的高词语 频率, 以及该词语在整个文件集合中的低文件频率, 相乘可求出高权重的 TF-IDF。 TF-IDF即表示过滤掉常见的词语, 保留重要的词语, 其值越高, 该 词语越重要。 Where pi: the total number of files in the database or the total number of landing pages, Ιϋ : ί3⁄4 Η: the number of files containing words or the number of landing pages containing words, (ie the number of files s '≠) if the word is not in the database or target In the web page, the divisor is zero, so it is generally used. Therefore, for the formula 1 ^^^ = tf x ic , the high word frequency within a particular file is known, and the word is in the entire file collection. The low file frequency, multiplied to find a high weight TF-IDF. TF-IDF means filtering out common words and retaining important words. The higher the value, the more important the word is.
BT服务器的计算单元通过上述公式, 计算出 tfai,p和 idf的值, 并将二者 相乘, 求出 tfidf的值。  The calculation unit of the BT server calculates the values of tfai, p and idf by the above formula, and multiplies the two to find the value of tfidf.
获取单元 302选取计算单元计算出的值最大的一个或多个 TF-IDF, 确定 所述一个或多个 TF-IDF值对应的元数据, 查询所述元数据与用户特征数据对 应关系表, 获取所述元数据对应的用户特征数据。  The obtaining unit 302 selects one or more TF-IDFs with the largest value calculated by the calculating unit, determines metadata corresponding to the one or more TF-IDF values, and queries the correspondence table between the metadata and the user feature data to obtain User characteristic data corresponding to the metadata.
例如, BT服务器的获取单元先获取终端访问的网页的所有类型, 有 "体 育类"、 "音乐类"、 "财经类"、 "IT类" 这 4个特定类型, 再选取 "体育"、 "音 乐"、 "财经"、 "IT" 4个特定词条, 分别求出其 tfidf的值, 然后按从大到小排 歹' J , 获取单元 302选取值最大的一个或排在前面的几个, 假设排列顺序为 "体 育"、 "音乐"、 "财经"、 "IT" , 则获取单元 302获取的该用户的用户特征为 "体 育迷", 或者是用户特征为 "体育迷" 和 "音乐达人", 具体的获取过程可以通 过查询配置元数据与用户特征的对应关系表获得。用户特征可以是用字符串表 示, 也可以用其他计算机常用的数据形式, 本实施例无限制。 For example, the acquisition unit of the BT server first acquires all types of web pages accessed by the terminal, and has four specific types of "sports class", "music class", "financial class", and "IT class", and then selects "sports", " Tone Four specific terms, "Le", "Finance", and "IT", respectively determine the value of tfidf, and then from the big to the small 歹 'J, the acquisition unit 302 selects the one with the largest value or the first one. If the order is "sports", "music", "finance", "IT", then the user feature of the user acquired by the acquisition unit 302 is "sports fan", or the user feature is "sports fan" and "user" The specific acquisition process can be obtained by querying the correspondence table between the configuration metadata and the user feature. The user feature can be represented by a character string or a data format commonly used by other computers, and the embodiment is not limited.
确定单元 302, 用于查询所述 BT服务器预配置的用户特征数据与投放策 略数据的对应关系表, 确定所述用户特征数据对应的投放策略数据;  The determining unit 302 is configured to query a correspondence table between the user feature data pre-configured by the BT server and the delivery policy data, and determine the delivery policy data corresponding to the user feature data;
需要说明的是, 该对应关系表是 BT服务器预先配置的。 例如, 某用户的 用户特征为 "普通白领 /潮人 /美食主义者", 对应的投放策略为 "消费电子 /餐 转义字符、 整形数等计算机常用的数据格式表示, 这里用字符串表示, 分别为 It should be noted that the correspondence table is pre-configured by the BT server. For example, a user's user characteristics are "ordinary white-collar/influx/gourmet", and the corresponding delivery strategy is "consumer electronic/meal-escaped characters, integer numbers, and other computer-used data format representations, which are represented by strings. Separately
" white— collar /fashion— follower/ food— lover" 、 " electronic— consumption/ group_purchasing_of_food"。 终端的 cookie,以便内容提供服务器收到所述终端下一次访问网页时发送的访 问请求后, 根据所述终端的 cookie 中的投放策略数据对所述终端进行个性化 内容投放。 " white_ collar /fashion— follower/ food- lover" , " electronic- consumption/ group_purchasing_of_food". The cookie of the terminal, so that the content providing server receives the access request sent by the terminal when the next time the webpage is accessed, and then delivers the personalized content to the terminal according to the delivery policy data in the cookie of the terminal.
需要说明的是, 终端下一次访问的网页可以是前面步骤中访问过的网页, 也可以是另外的网页。  It should be noted that the webpage accessed by the terminal next time may be the webpage accessed in the previous step, or may be another webpage.
写入单元 303写入的 cookie如表 1所示。  The cookie written by the writing unit 303 is as shown in Table 1.
写入 cookie后, 终端再次请求访问内容提供服务器提供的网页时, 其内 容提供服务器直接读取终端的 cookie,获得当前用户特征信息与其指定的投放 策略数据内容提供服务器直接执行上述个性化投放策略数据,针对当前用户直 接生成个性化内容跳转链接,或向第三方内容服务器请求个性化内容跳转链接, 终端访问上述链接时,内容提供服务器向用户投放个性化内容,完成精准投放。  After the cookie is written, when the terminal requests to access the webpage provided by the content providing server again, the content providing server directly reads the cookie of the terminal, obtains the current user characteristic information, and directly specifies the delivery policy data content providing server to execute the personalized personalized policy data. A personalized content jump link is generated directly for the current user, or a personalized content jump link is requested from the third-party content server. When the terminal accesses the link, the content providing server delivers the personalized content to the user, and completes the accurate delivery.
在本发明实施例中, 获取单元收到终端发送的用户信息后, 获取单元通过 对终端访问的网页进行网页类型分析 ,并根据用户信息和网页类型获取用户特 征数据及对应的投放策略, 由写入单元写入终端的 cookie, 以便下一次内容提 供服务器接收终端访问请求时, 通过读取终端的 cookie, 直接对终端进行个性 化内容的投放。 降低了带宽的消耗, 提高了服务器的交互效率。 In the embodiment of the present invention, after the obtaining unit receives the user information sent by the terminal, the acquiring unit performs webpage type analysis on the webpage accessed by the terminal, and obtains the user characteristic data and the corresponding delivery policy according to the user information and the webpage type, by writing Enter the unit's cookie into the terminal for the next content When the server receives the terminal access request, the terminal directly performs personalized service delivery on the terminal by reading the cookie of the terminal. Reduce bandwidth consumption and improve server interaction efficiency.
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本发 明可以用硬件实现, 或固件实现, 或它们的组合方式来实现。 当使用软件实现 时,可以将上述功能存储在计算机可读介质中或作为计算机可读介质上的一个 或多个指令或代码进行传输。计算机可读介质包括计算机存储介质和通信介质 , 其中通信介质包括便于从一个地方向另一个地方传送计算机程序的任何介质。 存储介质可以是计算机能够存取的任何可用介质。 以此为例但不限于: 计算机 可读介质可以包括 RAM、 ROM, EEPROM、 CD-ROM或其他光盘存储、 磁盘 存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构 形式的期望的程序代码并能够由计算机存取的任何其他介质。此外。任何连接 可以适当的成为计算机可读介质。例如,如果软件是使用同轴电缆、光纤光缆、 双绞线、 数字用户线(DSL )或者诸如红外线、 无线电和微波之类的无线技术 从网站、 服务器或者其他远程源传输的, 那么同轴电缆、 光纤光缆、 双绞线、 DSL或者诸如红外线、无线和微波之类的无线技术包括在所属介质的定影中。 如本发明所使用的, 盘 (Disk )和碟(disc ) 包括压缩光碟(CD )、 激光碟、 光碟、数字通用光碟(DVD )、软盘和蓝光光碟,其中盘通常磁性的复制数据, 而碟则用激光来光学的复制数据。上面的组合也应当包括在计算机可读介质的 保护范围之内。  From the description of the above embodiments, those skilled in the art can clearly understand that the present invention can be implemented in hardware, firmware implementation, or a combination thereof. When implemented in software, the functions described above may be stored in or transmitted as one or more instructions or code on a computer readable medium. Computer readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one location to another. A storage medium may be any available media that can be accessed by a computer. By way of example and not limitation, computer readable media may comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, disk storage media or other magnetic storage device, or can be used for carrying or storing in the form of an instruction or data structure. The desired program code and any other medium that can be accessed by the computer. Also. Any connection may suitably be a computer readable medium. For example, if the software is transmitted from a website, server, or other remote source using coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable , fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, wireless, and microwaves are included in the fixing of the associated media. As used in the present invention, a disk and a disc include a compact disc (CD), a laser disc, a disc, a digital versatile disc (DVD), a floppy disc, and a Blu-ray disc, wherein the disc is usually magnetically copied, and the disc is The laser is used to optically replicate the data. Combinations of the above should also be included within the scope of the computer readable media.
总之, 以上所述仅为本发明技术方案的较佳实施例而已, 并非用于限定本 发明的保护范围。凡在本发明的精神和原则之内,所作的任何修改、等同替换、 改进等, 均应包含在本发明的保护范围之内。  In summary, the above description is only a preferred embodiment of the technical solution of the present invention, and is not intended to limit the scope of the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and scope of the present invention are intended to be included within the scope of the present invention.

Claims

权 利 要 求 Rights request
1、 一种投放个性化内容的方法, 其特征在于, 包括: 1. A method of delivering personalized content, characterized by including:
用户行为分析 BT服务器获取终端在访问网页过程中发送的用户信息, 并 对所述终端访问的网页进行分析, 获取所述网页的网页类型; User behavior analysis: The BT server obtains the user information sent by the terminal during the process of accessing the web page, analyzes the web pages accessed by the terminal, and obtains the web page type of the web page;
根据所述网页的网页类型和所述用户信息, 获取用户特征数据; Obtain user characteristic data according to the web page type of the web page and the user information;
查询所述 BT服务器预配置的用户特征数据与投放策略数据的对应关系表, 确定所述用户特征数据对应的投放策略数据; Query the correspondence table between the user characteristic data and the delivery strategy data preconfigured by the BT server, and determine the delivery strategy data corresponding to the user characteristic data;
将所述用户特征数据和对应的投放策略数据写入所述终端的 cookie,以便 内容提供服务器收到所述终端下一次访问网页时发送的访问请求后,根据所述 The user characteristic data and the corresponding delivery strategy data are written into the cookie of the terminal, so that after the content providing server receives the access request sent by the terminal the next time it accesses the web page, it can
2、 根据权利要求 1所述的方法, 其特征在于, 所述将所述用户特征数据 和对应的投放策略数据写入所述终端的 cookie, 还包括: 置所述 cookie的所属域、 创建时间及过期时间。 2. The method according to claim 1, characterized in that: writing the user characteristic data and corresponding delivery strategy data into a cookie of the terminal further includes: setting the domain and creation time of the cookie. and expiration time.
3、 根据权利要求 1或 2所述的方法, 其特征在于, 所述根据所述网页的 网页类型和所述用户信息, 获取用户特征数据之前, 还包括: 3. The method according to claim 1 or 2, characterized in that, before obtaining the user characteristic data according to the web page type of the web page and the user information, it further includes:
对所述用户信息进行校验, 将校验成功后的用户信息重构为点击流数据。 The user information is verified, and the successfully verified user information is reconstructed into click stream data.
4、 根据权利要求 3所述的方法, 其特征在于, 若对所述用户信息的校验 失败, 删除所述用户信息。 4. The method according to claim 3, characterized in that if the verification of the user information fails, the user information is deleted.
5、 根据权利要求 3或 4所述的方法, 其特征在于, 所述获取用户特征数 据, 包括: 5. The method according to claim 3 or 4, characterized in that the obtaining user characteristic data includes:
根据所述点击流数据,计算所述网页类型对应的词频 TF、反文档频率 IDF 及所述 TF与所述 IDF的乘积词频 -反文档频率 TF-IDF; According to the click stream data, calculate the word frequency TF, inverse document frequency IDF corresponding to the web page type, and the product word frequency - inverse document frequency TF-IDF of the TF and the IDF;
选取值最大的一个或多个 TF-IDF值, 确定所述一个或多个 TF-IDF值对 应的网页类型, 查询所述网页类型与用户特征数据对应关系表, 获取所述网页 类型对应的用户特征数据。 Select one or more TF-IDF values with the largest value, determine the web page type corresponding to the one or more TF-IDF values, query the correspondence table between the web page type and user characteristic data, and obtain the corresponding web page type. User characteristic data.
6、 根据权利要求 1一 5任一项所述的方法, 其特征在于, 所述获取所述网 页的网页类型, 包括: 6. The method according to any one of claims 1 to 5, characterized in that said obtaining the web page type of the web page includes:
设置网页的类型集合及网页的类型对应的频数的集合; 获取所述网页的特征数据,所述特征数据包括所述网页的类型集合对应的 关键词、 字符间距和文本长度; Set a collection of types of web pages and a collection of frequencies corresponding to the types of web pages; Obtain the characteristic data of the web page, the characteristic data includes keywords, character spacing and text length corresponding to the type set of the web page;
根据所述网页的类型对应的频数的集合, 计算所述特征数据的概率, 选取 所述计算出的概率值中最大的一个或多个概率值,获取所述选取的概率值对应 的网页类型。 Calculate the probability of the feature data according to the set of frequencies corresponding to the type of the web page, select the largest one or more probability values among the calculated probability values, and obtain the web page type corresponding to the selected probability value.
7、根据权利要求 1一 6任一项所述的方法, 所述用户信息包括超文本传输 协议 HTTP请求信息、 被请求网页的信息以及用户行为信息。 7. The method according to any one of claims 1 to 6, the user information includes Hypertext Transfer Protocol HTTP request information, requested web page information and user behavior information.
8、 一种用户行为分析 BT服务器, 其特征在于, 包括: 8. A user behavior analysis BT server, which is characterized by including:
获取单元, 用于获取终端在访问网页过程中发送的用户信息, 并对所述终 端访问的网页进行分析, 获取所述网页的网页类型; The acquisition unit is used to obtain the user information sent by the terminal during the process of accessing the web page, analyze the web pages accessed by the terminal, and obtain the web page type of the web page;
所述获取单元,还用于根据所述网页的网页类型和所述用户信息, 获取用 户特征数据; The acquisition unit is also used to acquire user characteristic data according to the web page type of the web page and the user information;
确定单元, 用于查询所述 BT服务器预配置的用户特征数据与投放策略数 据的对应关系表, 确定所述用户特征数据对应的投放策略数据; The determination unit is used to query the correspondence table between the user characteristic data and the delivery strategy data preconfigured by the BT server, and determine the delivery strategy data corresponding to the user characteristic data;
写入单元,用于将所述用户特征数据和对应的投放策略数据写入所述终端 的 cookie,以便内容提供服务器收到所述终端下一次访问网页时发送的访问请 求后, 根据所述终端的 cookie 中的投放策略数据对所述终端进行个性化内容 投放。 A writing unit configured to write the user characteristic data and the corresponding delivery strategy data into the cookie of the terminal, so that after the content providing server receives the access request sent by the terminal when it next accesses the web page, it can write the user characteristic data and the corresponding delivery strategy data into the cookie according to the terminal The delivery strategy data in the cookie is used to deliver personalized content to the terminal.
9、 根据权利要求 8所述的 BT服务器, 其特征在于, 所述写入单元将所 述用户特征数据和对应的投放策略数据写入所述终端的 cookie, 还包括: 所述写入单元添加所述用户特征数据和对应的投放策略数据至所述终端 的 cookie , 并设置所述 cookie的所属域、 创建时间及过期时间。 9. The BT server according to claim 8, wherein the writing unit writes the user characteristic data and the corresponding delivery strategy data into the cookie of the terminal, and further includes: the writing unit adds The user characteristic data and the corresponding delivery strategy data are sent to the cookie of the terminal, and the domain, creation time and expiration time of the cookie are set.
10、 根据权利要求 8或 9所述的 BT服务器, 其特征在于, 所述 BT服务 器还包括: 10. The BT server according to claim 8 or 9, characterized in that the BT server further includes:
校验单元, 用于对所述用户信息进行校验; A verification unit, used to verify the user information;
重构单元, 用于将校验成功后的用户信息重构为点击流数据。 The reconstruction unit is used to reconstruct the user information after successful verification into click stream data.
11、 根据权利要求 10所述的 BT服务器, 其特征在于, 所述服务器还包 括: 11. The BT server according to claim 10, characterized in that the server further includes:
计算单元,用于根据所述点击流数据,计算所述网页类型对应的词频 TF、 反文档频率 IDF及所述 TF与所述 IDF的乘积 TF-IDF; A calculation unit, configured to calculate the word frequency TF corresponding to the web page type according to the click stream data, Inverse document frequency IDF and the product of the TF and the IDF TF-IDF;
所述获取单元, 还用于选取值最大的一个或多个 TF-IDF值, 确定所述一 个或多个 TF-IDF值对应的网页类型, 查询所述网页类型与用户特征数据对应 关系表, 获取所述网页类型对应的用户特征数据。 The acquisition unit is also used to select one or more TF-IDF values with the largest value, determine the web page type corresponding to the one or more TF-IDF values, and query the correspondence table between the web page type and user characteristic data. , obtain user characteristic data corresponding to the web page type.
12、 根据权利要求 8— 11任一项所述的 BT服务器, 其特征在于, 所述 BT 服务器还包括: 12. The BT server according to any one of claims 8 to 11, characterized in that the BT server further includes:
设置单元, 用于设置网页的类型集合及网页的类型对应的频数的集合; 所述获取单元, 还用于获取所述网页的特征数据, 所述特征数据包括所述 网页的类型集合对应的关键词、 字符间距和文本长度; The setting unit is used to set the type set of web pages and the set of frequencies corresponding to the types of web pages; the obtaining unit is also used to obtain the characteristic data of the web page, and the characteristic data includes the key corresponding to the type set of the web page Words, character spacing and text length;
所述获取单元, 还用于根据所述网页的类型对应的频数的集合, 计算所述 特征数据的概率, 选取所述计算出的概率值中最大的一个或多个概率值, 获取 所述选取的 4既率值对应的网页类型。 The acquisition unit is also configured to calculate the probability of the feature data according to the set of frequencies corresponding to the type of the web page, select the largest one or more probability values among the calculated probability values, and obtain the selection The 4 rate values correspond to the web page type.
13、 根据权利要求 8— 12任一项所述的 BT服务器, 所述用户信息包括超 文本传输协议 HTTP请求信息、 被请求网页的信息以及用户行为信息。 13. The BT server according to any one of claims 8 to 12, the user information includes Hypertext Transfer Protocol HTTP request information, requested web page information and user behavior information.
PCT/CN2013/076308 2012-08-10 2013-05-28 Method and device for launching individual content WO2014023121A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201210284928.8 2012-08-10
CN201210284928.8A CN103577504A (en) 2012-08-10 2012-08-10 Method and device for putting personalized contents

Publications (1)

Publication Number Publication Date
WO2014023121A1 true WO2014023121A1 (en) 2014-02-13

Family

ID=50049300

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2013/076308 WO2014023121A1 (en) 2012-08-10 2013-05-28 Method and device for launching individual content

Country Status (2)

Country Link
CN (1) CN103577504A (en)
WO (1) WO2014023121A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104836781A (en) * 2014-02-20 2015-08-12 腾讯科技(北京)有限公司 Method distinguishing identities of access users, and device

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105488038B (en) * 2014-09-15 2021-03-05 创新先进技术有限公司 Personalized information matching method and device for communication application
CN108322355A (en) * 2017-01-18 2018-07-24 北京京东尚科信息技术有限公司 User traffic data processing method, processing unit, electronic equipment and storage medium
CN108459890B (en) * 2017-02-20 2021-10-26 百度在线网络技术(北京)有限公司 Interface display method and device for application
CN108573750B (en) * 2017-03-07 2021-01-15 京东方科技集团股份有限公司 Method and system for automatically discovering medical knowledge
CN109933389B (en) * 2017-12-19 2022-08-23 阿里巴巴集团控股有限公司 Data object information processing and page display method and device
CN111274516B (en) * 2018-12-04 2024-04-05 阿里巴巴新加坡控股有限公司 Page display method, page configuration method and device
CN111861564B (en) * 2020-07-20 2021-07-13 深圳我买家网络科技有限公司 Digital advertisement transaction system
CN113726900A (en) * 2021-09-02 2021-11-30 四川启睿克科技有限公司 System for judging age bracket of user child
CN117473200B (en) * 2023-12-26 2024-03-08 天津戎行集团有限公司 Comprehensive acquisition and analysis method for website information data

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030014304A1 (en) * 2001-07-10 2003-01-16 Avenue A, Inc. Method of analyzing internet advertising effects
CN101034997A (en) * 2006-03-09 2007-09-12 新数通兴业科技(北京)有限公司 Method and system for accurately publishing the data information
CN101079063A (en) * 2007-06-25 2007-11-28 腾讯科技(深圳)有限公司 Method, system and apparatus for transmitting advertisement based on scene information
CN101431524A (en) * 2007-11-07 2009-05-13 阿里巴巴集团控股有限公司 Method and device for implementing oriented network advertisement delivery
CN102301658A (en) * 2009-09-11 2011-12-28 华为技术有限公司 Advertisement Delivery Method, Advertisement Server And Advertisement System

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030014304A1 (en) * 2001-07-10 2003-01-16 Avenue A, Inc. Method of analyzing internet advertising effects
CN101034997A (en) * 2006-03-09 2007-09-12 新数通兴业科技(北京)有限公司 Method and system for accurately publishing the data information
CN101079063A (en) * 2007-06-25 2007-11-28 腾讯科技(深圳)有限公司 Method, system and apparatus for transmitting advertisement based on scene information
CN101431524A (en) * 2007-11-07 2009-05-13 阿里巴巴集团控股有限公司 Method and device for implementing oriented network advertisement delivery
CN102301658A (en) * 2009-09-11 2011-12-28 华为技术有限公司 Advertisement Delivery Method, Advertisement Server And Advertisement System

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104836781A (en) * 2014-02-20 2015-08-12 腾讯科技(北京)有限公司 Method distinguishing identities of access users, and device

Also Published As

Publication number Publication date
CN103577504A (en) 2014-02-12

Similar Documents

Publication Publication Date Title
WO2014023121A1 (en) Method and device for launching individual content
US9876751B2 (en) System and method for analyzing messages in a network or across networks
JP6506401B2 (en) Suggested keywords for searching news related content on online social networks
US9460458B1 (en) Methods and system of associating reviewable attributes with items
US11681750B2 (en) System and method for providing content to users based on interactions by similar other users
US9251500B2 (en) Searching topics by highest ranked page in a social networking system
US8909569B2 (en) System and method for revealing correlations between data streams
US20150256499A1 (en) Ranking, collection, organization, and management of non-subscription electronic messages
US20140129331A1 (en) System and method for predicting momentum of activities of a targeted audience for automatically optimizing placement of promotional items or content in a network environment
US10445753B1 (en) Determining popular and trending content characteristics
US20160021037A1 (en) Recommendation of a location resource based on recipient access
US20150100591A1 (en) Determining a Community Page for a Concept in a Social Networking System
US20200359210A1 (en) Secure communication in mobile digital pages
US9946794B2 (en) Accessing special purpose search systems
US10078656B1 (en) Unmodifiable data in a storage service
US20160196267A1 (en) Configuring a web feed
JP6200894B2 (en) Giving universal social context to concepts in social networking systems
US10210465B2 (en) Enabling preference portability for users of a social networking system
US11580476B1 (en) Detecting a landing page that violates an online system policy based on a structural similarity between the landing page and a web page violating the policy
US20160285885A1 (en) Contextual contacts for html5
US20230409743A1 (en) Methods And Systems For Obtaining, Controlling And Viewing User Data
US11086948B2 (en) Method and system for determining abnormal crowd-sourced label
WO2016055832A1 (en) A computer-based system, computer-implemented methods, and a computer program product for providing ranked recommendation data

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13828658

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13828658

Country of ref document: EP

Kind code of ref document: A1