CN100458797C - Process for ordering network advertisement - Google Patents

Process for ordering network advertisement Download PDF

Info

Publication number
CN100458797C
CN100458797C CN 200710117607 CN200710117607A CN100458797C CN 100458797 C CN100458797 C CN 100458797C CN 200710117607 CN200710117607 CN 200710117607 CN 200710117607 A CN200710117607 A CN 200710117607A CN 100458797 C CN100458797 C CN 100458797C
Authority
CN
China
Prior art keywords
text
vector
information
advertising
advertisement
Prior art date
Application number
CN 200710117607
Other languages
Chinese (zh)
Other versions
CN101097580A (en
Inventor
峰 郑
Original Assignee
精实万维软件(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 精实万维软件(北京)有限公司 filed Critical 精实万维软件(北京)有限公司
Priority to CN 200710117607 priority Critical patent/CN100458797C/en
Publication of CN101097580A publication Critical patent/CN101097580A/en
Application granted granted Critical
Publication of CN100458797C publication Critical patent/CN100458797C/en

Links

Abstract

本发明涉及互联网中文智能处理技术领域,公开了一种对网络广告进行排序的方法,该方法包括:利用广告监控程序从网站获取广告数据,从获取的广告数据中提取出广告主的信息和广告描述信息;分别对广告主的信息和广告描述信息进行分词得到关键词,建立该关键词的索引;计算每个建立索引的关键词的相关性,按照计算的相关性从高到低对网络广告进行排序。 The present invention relates to intelligent processing technology Chinese Internet, discloses a method of sorting network advertising, the method comprising: monitoring program acquires advertisement data from the advertisement website extracted advertiser and advertisement information from the advertisement data acquired description information; describe details of the advertiser and the advertisement word information obtained keyword, the keyword index to establish; calculating the correlation index for each keyword, in accordance with the calculated correlation descending online advertising put in order. 利用本发明,实现了对网络广告的排序,能够迅速确定一个关键词对应的众多广告的相关性排序,从而能够方便广告设计人员用最短的时间找到适合自己的广告资料。 With the present invention, to achieve a sort of online advertising can quickly determine relevance ranking of a keyword corresponding to many advertisers, enabling designers to facilitate advertising with the shortest possible time to find their own advertising materials. 同时广告主也可以利用这个系统来查看竞争对手的广告投放情况,对设计自己的广告投放方案提供支持。 While advertisers can also use this system to see competitors' advertising, and to provide support to design their own advertising program.

Description

一种对网络广告进行排序的方法 A method for sorting of online advertising

技术领域 FIELD

本发明涉及互联网中文智能处理技术领域,尤其涉及一种对网络广告进行排序的方法。 The present invention relates to intelligent processing Chinese Internet technology, and particularly relates to a method of online advertising sort.

背景技术 Background technique

随着互联网的普及,网络广告出现了强劲的增长势头,选择投放网络广告的客户也是越来越多。 With the popularity of the Internet, online advertising, showing a strong growth momentum, selected to run online advertising is more and more customers. 那么,在研究某一类行业或者某一类产品的广告的时候,就会面临大量的广告,究竟哪个广告同用户的检索行为最相关, 广告排序这个问题就产生了。 So, in the study of a class advertising industry or a certain type of product and we will face a lot of advertising, to find out which ads are most relevant to the user with the retrieval behavior, advertising sort this problem arises.

例如用户输入"汽车"这个关键词,和汽车相关的广告有成千上万, 如何将这些广告呈现给用户,排序就显得比较重要了。 For example, the user enters "automobile" the key words, and there are thousands of car-related advertising, and how these ads will be presented to the user, sorting becomes more important. 本发明就是为了解决上述问题而产生的。 The present invention is made to solve the above problems arising.

发明内容 SUMMARY

(一) 要解决的技术问题 (A) To solve technical problems

有鉴于此,本发明的主要目的在于提供一种对网络广告进行排序的方法,以实现对网络广告的排序。 In view of this, the main object of the present invention is to provide a sort of online advertising method, in order to achieve the sort of online advertising.

(二) 技术方案 (B) Technical Solution

为达到上述目的,本发明提供了一种对网络广告进行排序的方法,该方法包括: To achieve the above object, the present invention provides a method of sorting network advertising, the method comprising:

利用广告监控程序从网站获取广告数据,从获取的广告数据中提取出广告主的信息和广告描述信息,该步骤具体包括: The Ad acquires advertisement data from the monitor site, the extracted advertiser information and advertisement data from the advertisement description information acquired in the step comprises:

利用广告监控程序蜘蛛(spider)监控各个网站的广告投放情况, Use of advertising monitoring program spider (spider) Monitor your various websites at once,

并将原始网页内容作为网页快照保存到网页快照库中; And original content as a web page snapshot save web page snapshot to the library;

对网页快照库中保存的广告数据中的文字信息进行分词,得到一 For text messages advertising the data stored in the library web page snapshot perform word, get a

组文本向量; Group text vector;

根据所述文本向量的特征,对所述文本向量进行向量加权或向量'减权; The text of the feature vector, the weight vector text vector or a vector 'Save right;

采用空间向量模型计算所述进行了向量加权或向量减权后的文本向量的权重; Vector space model using the calculated weight vector or a weight was the weight of the text vector minus right;

对计算出来的文本向量的权重进行排序,并根据文本向量所在网页中的上下文信息,从网页中提取出广告主的信息和广告描述信息; 分别对广告主的信息和广告描述信息进行分词得到关键词,建立该关键词的索引; On the right the calculated text vector is re-sorted, and the context information page where the text vector extracted advertiser information and advertisement description information from the Web page; respectively advertiser information and the advertisement description information word to obtain the key words, the establishment of the keyword index;

计算每个建立索引的关键词的相关性,按照计算的相关性从高到低对 Calculated for each keyword indexing of relevance, according to the calculation of the relevance descending

网络广告进行排序。 Internet advertising order.

上述方案中,所述对网页快照库中保存的广告数据中的文字信息进行 In the above solution, the conduct of a text message advertisement data stored in the library web page snapshot

分词包括:将现代汉语的普通字序列文本分解为词序列的文本。 Word include: ordinary text word sequence is decomposed into modern Chinese text word sequence.

上述方案中,所述对文本向量进行向量加权或向量减权的步骤包括对出现在标题中的文本向量,将向量权重增至原来的5至10倍; 对出现在网页结构中内容(content)的简介的文本向量,将向量权重 In the above embodiment, the text vector minus the vector weighting step to the right or to include in the title of the text vector, the weight vector is increased to 5 to 10 times the original; content appears on the page structure (content) Introduction of text vector, the vector weight

增至原来的2至3倍; Increased to 2 to 3 times the original;

对出现在网页内容中版权信息类的文本向量,将向量权重增至原来的 Vector copyright text information such as the content of the page that appears, the vector of the original weight increased

3至5倍; 3 to 5 times;

对出现在网页内容中与广告主的信息有关的文本向量,将向量权重增至原来的3至5倍; Vector text appears on the page content and information related to the advertiser, the vector weight increased to 3-5 times the original;

对出现的包含在停词表中的文本向量,将向量权重减至原来的1/5至1/10。 Text stop words contained in the vector table appears, the vector weight reduced to 1/5 to 1/10.

上述方案中,所述空间向量模型采用以下公式来表征.- In the above embodiment, the vector space model is characterized by the following formula .-

,U,)„,+o.oi)]2 , U,) ", + o.oi)] 2

其中,为词f在文本中的权重,而^(r,S)为词f在文本S中的词频,iV为训练文本的总数,n,为训练文本集中出现f的文本数, 分母为归一化因子。 Wherein Ci f right in the text weight, and ^ (r, S) Ci f term frequency in the text S, the total number of iV the training text, n, is a training text f-number text appears concentrated, the denominator is the normalized a factor.

上述方案中,所述对计算出来的文本向量的权重进行排序时,首先设 In the above embodiment, when the right to the text vector is calculated sorting weight, the first set

定一个阈值,将权重大于该阈值的文本向量挑选出来构成一个集合,然后再根据所在网页中的上下文信息,从所述集合中提取出需要的广告主的信息和广告描述信息。 A predetermined threshold value, the weight is greater than the threshold value constituting a chosen text vector set, then the page where the context information according to the extracted advertisers and advertising information required description information from the set.

上述方案中,所述对网页快照库中保存的广告数据中的文字信息进行分词的步骤中,所述分词包括:将现代汉语的普通字序列文本分解为词序列的文本。 In the above solution, the advertisement information to the text data stored in the library web page snapshot step of the word, the word include: ordinary text word sequence is decomposed into modern Chinese text word sequence.

上述方案中,所述计算关键词的相关性的步骤中采用公式? In the above embodiment, the step of calculating the correlation of the keyword using the equation? = 3^111 + ^ 3 = 111 +

a2xc + a3xh来计算关键词的相关性,其中al、 a2和a3是常量系数,且al+a2+a3 = 1,在实际运算时al、 a2和a3所占的权重可调,m为每个广告的投放的网站/频道信息、c为广告内容描述信息、h为广告主的信息, 具体计算过程包括:根据实际情况确定al、 a2和a3的值,然后分别计算m、 c和h的值,将al、 a2、 a3、 m、 c和h的值带入公式P = alxm + a2xc + a3xh计算得到关键词的相关性。 a2xc + a3xh keywords to calculate the correlation, wherein the Al, the coefficient a2 and a3 are constants, and al + a2 + a3 = 1, al, a3 and a2 weight percentage weights adjustable in the actual calculation, m for each served ads site / channel information, c is the advertisement content description information, h is the information of the advertiser, the details of the calculation comprising: determining al actual situation, the value of a2, and a3, and then calcd m, c and h, respectively, the al, a2, a3, the value of m, c and h into the formula P = alxm + a2xc + a3xh keyword calculated correlation.

上述方案中,计算广告内容描述信息c的值和计算广告主的信息h的值采用下面的空间向量模型进行: The above-described embodiment, the calculated value h ad description information and the calculated value c advertisers using the following vector space model:

l(1 + lQg2非,力)x log2 /",才其中,C(fj)为词f在文本S中的权重,?/^,S)为词f在文本S 中的词频,TV为训练文本的总数,"f为训练文本集中出现f的文本数,分母为归一化因子。 l (1 + lQg2 non-force) x log2 / ", before which, C (fj) for the word f right in the text S in weight,? / ^, S) for the word f in the text S in the word frequency, TV is training the total number of the text, "f text focused on training for the f-number text appears, the denominator is a normalization factor.

(三)有益效果从上述技术方案可以看出,本发明具有以下有益效果: (C) Advantageous Effects As can be seen from the above technical solutions, the present invention has the following advantages:

1、 利用本发明,通过从网站获取广告数据,并从获取的广告数据中提取出广告主的信息和广告描述信息;然后分别对广告主的信息和广告描述信息进行分词得到的关键词,建立该关键词的索引,计算每个建立索引的关键词的相关性,按照计算的相关性从高到低对网络广告进行排序,实现了对网络广告的排序。 1, with the present invention, by obtaining from the website advertisement data, and extract the advertiser information and the advertisement description information from the advertisement data acquired; and respectively the advertiser information and the advertisement description information word obtained keyword, the establishment of the keyword index to calculate the relevance of each keyword indexing, in accordance with the correlation computing descending sort of online advertising, to achieve a sort of online advertising.

2、 利用本发明,能够迅速确定一个关键词对应的众多广告的相关性排序,从而能够方便广告设计人员用最短的时间找到适合自己的广告资 2, with the present invention, it is possible to quickly determine the relevance ranking of a keyword corresponding to the number of ads, so that a designer can easily find their advertising funding for the shortest possible time

<formula>formula see original document page 7</formula> <Formula> formula see original document page 7 </ formula>

料。 material. 同时广告主也可以利用这个系统来查看竞争对手的广告投放情况,对设计自己的广告投放方案提供支持。 While advertisers can also use this system to see competitors' advertising, and to provide support to design their own advertising program.

附图说明 BRIEF DESCRIPTION

图1为本发明提供的对网络广告进行排序的方法流程图; 1. A method of FIG sort of online advertising is a flowchart of the present invention;

图2为依照本发明实施例建立关键词索引的示意图; FIG 2 is a schematic diagram according to an embodiment of the present invention builds a keyword index;

图3为依照本发明实施例对网络广告进行排序的示意图; A schematic diagram of online advertising is a sort embodiment of FIG. 3 in accordance with the present invention;

图4为依照本发明实施例建立的倒排索引的示意图; FIG 4 is a schematic inverted index according to the present invention is established in accordance with the embodiment;

图5为依照本发明实施例对网络广告进行排序的结果示意图。 FIG 5 is a schematic example of the results of online advertising sorting embodiment according to the present invention.

具体实施方式 Detailed ways

为使本发明的目的、技术方案和优点更加清楚明白,以下结合具体实施例,并参照附图,对本发明进一步详细说明。 To make the objectives, technical solutions, and advantages of the present invention will become more apparent hereinafter in conjunction with specific embodiments, and with reference to the accompanying drawings, the present invention is further described in detail.

如图1所示,图1为本发明提供的对网络广告进行排序的方法流程图, 该方法包括以下步骤: 1, FIG. 1 network advertisement method to sort a flowchart of the present invention, the method comprising the steps of:

步骤101:利用广告监控程序从网站获取广告数据,从获取的广告数据中提取出广告主的信息和广告描述信息; Step 101: The Ad acquires advertisement data from the monitor site, the extracted advertiser information and advertisement data from the advertisement description information acquired;

步骤102:分别对广告主的信息和广告描述信息进行分词得到关键词, 建立该关键词的索引; Step 102: describe details of the advertiser and the advertisement word information obtained keyword, the keyword index to establish;

步骤103:计算每个建立索引的关键词的相关性,按照计算的相关性从高到低对网络广告进行排序。 Step 103: calculating a correlation index for each keyword, in accordance with the calculated correlation descending sort of online advertising.

上述步骤101中所述利用广告监控程序从网站获取广告数据的步骤包括:利用广告监控程序蜘蛛(spider)监控各个网站的广告投放情况,并将原始网页内容作为网页快照保存到网页快照库中。 Step step 101 in the use of advertising monitoring program to obtain data from a Web site advertising include: use of advertising monitoring program spider (spider) Monitor your various websites at once, and original content as a web page snapshot web snapshot saved to the library.

上述步骤101中所述从获取的广告数据中提取出广告主的信息和广告描述信息的步骤包括:对网页快照库中保存的广告数据中的文字信息进行分词,得到一组文本向量;根据所述文本向量的特征,对所述文本向量进行向量加权或向量减权;采用空间向量模型计算所述进行了向量加权或向量减权后的文本向量的权重;对计算出来的文本向量的权重进行排序,并根据文本向量所在网页中的上下文信息,从网页中提取出广告主的信息和 Above step the extract from the advertisement data acquired in the advertiser information and the advertisement information describing the step 101 includes: character information advertisement data stored in the web page snapshot database is performed word, to obtain a set of text vector; in accordance with the wherein said text vector, the text vector is vector-weighting or the minus power; spatial vector model calculation device, wherein the vector weights or a weight to weight text vector after minus right; on right calculated text vector weights were sort the context information according to the page where the text vector, extracted from the web page information and the advertiser

广告描述信息。 Ad description.

上述对网页快照库中保存的广告数据中的文字信息进行分词包括:将现代汉语的普通字序列文本分解为词序列的文本。 The above information to the text of the advertisement data stored in the library web page snapshot perform word include: ordinary text word sequence is decomposed into modern Chinese text word sequence.

上述对文本向量进行向量加权或向量减权的步骤包括:对出现在标题中的文本向量,将向量权重增至原来的5至10倍;对出现在网页结构中 The text above weighting vector or vectors of step minus right includes: the text in the title of the vector, the weight vector is increased to 5 to 10 times the original; structure appears in the webpage

内容(content)的简介的文本向量,将向量权重增至原来的2至3倍;对出现在网页内容中版权信息类的文本向量,将向量权重增至原来的3至5 倍;对出现在网页内容中与广告主的信息有关的文本向量,将向量权重增至原来的3至5倍;对出现的包含在停词表中的文本向量,将向量权重减至原来的1/5至1/10。 Introduction of content (content) of text vector, the vector weight increased to the original 2-3 times; to appear in the page content text vectors copyright information class, the vector weight increased to 3-5 times the original; it appears in text vector web content and information related to the advertiser, the vector weight increased to 3-5 times the original; vector contains the text in the stop word list that appears, the vector weight reduced to 1/5 to 1 / 10.

上述空间向量模型采用以下公式来表征: ,力=f/(")xlog(iV/",+0.01),其中,为词,在文本s中U/(a)xlog(肠,+0.01)]2 To characterize the above-described vector space model using the following equation: force = f / ( ") xlog (iV /", + 0.01), where Ci, U in the text in s / (a) xlog (intestine, + 0.01)] 2

的权重,而f/G,3)为词f在文本S中的词频,iV为训练文本的总数, Weights, and f / G, 3) for the words in the text word frequency f S is, the iV total number of training texts,

/7,为训练文本集中出现f的文本数,分母为归一化因子。 / 7, for the training of text f Text concentrated there, the denominator is a normalization factor.

上述对计算出来的文本向量的权重进行排序时,首先设定一个阈值, 将权重大于该阈值的文本向量挑选出来构成一个集合,然后再根据所在网页中的上下文信息,从所述集合中提取出需要的广告主的信息和广告描述信息。 The above of the right calculated text vector is re-sorted, first setting a threshold value, the weight is greater than text vector the threshold value picked constitute a set, and then according to the context information in the webpage is extracted from the set advertisers and advertising information needed description.

另外,在步骤101中, 一般是先准备一个广告监控程序spider来监控各个网站的广告投放情况,并且把这些数据作为快照(原始网页内容)保存起来。 In addition, in step 101, the general is to prepare an advertising monitoring program to monitor spider ads each site at once, and these data as snapshots (original Web content) are saved. 本发明所用的spider是发明人自主研发的,主要用来监控一百多个媒体,二千多个频道,几万个网页的变化情况。 Used in the present invention spider is the inventor of independent research and development, it is mainly used to monitor more than one hundred media, over two thousand channels, changes in the tens of thousands of pages. 然后利用广告主的信息提取技术,来提取出广告信息,包括广告主的信息和广告描述信息。 Then using advertiser information extraction techniques to extract advertisement information, advertiser information including description information and advertising. 然后把广告主的信息和广告描述进行分词,并建立索引,这样方便通过关键词来査找。 Then the advertisers and the advertising information describing word segmentation, and indexed, so easy to find by keyword. 所谓分词是指:针对现代汉语字序列文本(普通),分解为词序 The so-called word means: modern Chinese word for the sequence of text (general), broken down into word order

列的文本,如:我们的祖国多美好,经过分词之后变为:我们的祖国多美好。 Columns of text, such as: our country more beautiful, after word becomes: our country more beautiful. 然后再把每个建立索引的关键词的相关性计算出来,这样就得到了 Then put each correlation calculation indexed keywords out, so get up

一个"关键词-广告集合"的倒排表(如图2所示,图2为依照本发明实 A "Keyword - a set of advertisements" posting list (FIG. 2, FIG. 2 in accordance with embodiments of the present invention

施例建立关键词索引的示意图)。 Example Establishment schematic keyword index). 广告集合是已经按照相关性排好序的, 这样在检索的时候就能快速的返回结果了。 Advertising collection is already sorted by relevance, so you can quickly return results when the search.

上述步骤102中所述对网页快照库中保存的广告数据中的文字信息进 The above step for the web page snapshot advertisement data stored in the character information database 102 into

行分词的步骤中,所述分词包括:将现代汉语的普通字序列文本分解为词序列的文本。 Step word line, said word comprising: ordinary text Modern Chinese word sequence is decomposed into a sequence of words of text.

上述步骤103中所述计算关键词的相关性的步骤中采用公式P-alxm + 32《0 + &3《11来计算关键词的相关性,其中al、 a2和a3是常量系数,且al+a2+a3 = 1,在实际运算时al、 a2和a3所占的权重可调,m为每个广告的投放的网站/频道信息、c为广告内容描述信息、h为广告主的信息, 具体计算过程包括:根据实际情况确定al、 a2和a3的值,然后分别计算m、 c禾Qh的值,将al、 a2、 a3、 m、 c和h的值带入公式P = alxm + a2xc + a3xh计算得到关键词的相关性。 In the above step 103 calculates the correlation step of keywords used in the formula P-alxm + 32 "0 + & 3" 11 to calculate the correlation of the keyword, wherein al, a2 and a3 are constant coefficients, and al + a2 + a3 = 1, al, a2, and a3 share of the weight is adjustable in the actual calculation, m is served ads per site / channel information, description information for the ad content C, h is the information of the advertiser, specific calculation the process comprising: determining the actual situation Al, a2, and a3 value, and then calculates values ​​for m, c Wo Qh of the Al, a2, a3, the value of m, c and h into the formula P = alxm + a2xc + a3xh keywords calculated correlation.

上述计算每个广告的投放的网站/频道信息m的值的过程包括:假设 The process value is calculated for each ad serving site / m channel information comprises: assuming

Tr(k)=力7>(") , Tr(k)代表第k篇命中广告的网站流量排名(Traffic Rank), Tr (k) = force 7> ( "), Tr (k) represents the k article hits ad traffic rank (Traffic Rank),

是由n个投放的网站/频道的Traffic Rank之和组成,Traffic Rank为每百万人访问量,则代表第k篇命中广告归一后的网页级别(pagerank)值的 By Traffic Rank of n-launch website / channel and composition, Traffic Rank per million person visits, it represents the k-hit advertising articles go after a page-level (pagerank) values

柳=-!E^-。 Liu = -! E ^ -.

max(Pr(l),Pr(2),,,P柳 max (Pr (l), Pr (2) ,,, P Liu

上述计算广告内容描述信息C的值采用下面的空间向量模型进行: CO , ,其中,c(a)为词f在文本^ Described above calculations advertisement content information C using the following values ​​of the vector space model: CO,, where, c (a) is a word in the text ^ f

k1 + W,力)X (W / "' k1 + W, force) X (W / " '

中的权重,f/《j)为词f在文本^中的词频,W为训练文本的总数, In weight, f / "j) for the word in the text word frequency f ^ is, W is the total number of training texts,

为训练文本集中出现? To focus on training text appear? 的文本数,分母为归一化因子。 The text number, the denominator is a normalization factor.

上述计算广告主的信息h的值与计算广告内容描述信息c的值采用相 Value and the calculated contents of advertisement information of the advertiser calculated value h using a description of phase c

同的空间向量模,这里就不再赘述。 With space vector mode, not repeat them here.

基于图l所示的对网络广告进行排序的方法流程图,以下结合具体的 The method of FIG l sort of online advertising is shown in the flowchart based on the following detailed

实施例对本发明提供的对网络广告进行排序的方法进一步详细说明。 The method of online advertising sorting embodiment of the present invention provides further detail. 实施例 Example

在本实施例中,以用户输入"汽车"这个关键词为例,详细描述对搜索到的有关汽车的网络广告进行排序的整个过程。 In the present embodiment, the user inputs "car" keyword as an example, a detailed description of the process of the motor vehicle to search for online advertising sort.

如图3所示,图3为依照本发明实施例对网络广告进行排序的示意图, As shown in FIG. 3, FIG. 3 is a schematic diagram of a network advertisement sorting embodiment according to the present invention,

该方法包括以下步骤: The method comprises the steps of:

步骤301:采用广告监控程序spider监控各个网站的广告投放情况, 定期从搜狐汽车、新浪汽车等网站上抓取广告数据信息,并将原始网页内容作为网页快照保存到网页快照库中。 Step 301: The Advertising Monitor spider monitor advertising each site at once, periodically crawl advertisement data from the car Sohu, Sina cars and other sites, content and original web page snapshot to save as web page snapshot library.

步骤302:从获取的广告数据中提取出广告主的信息和广告描述信息, 并格式化获取的广告数据信息; Step 302: extracting the advertisement data acquired from an advertiser and advertisement information description information, and the acquired advertisement data format information;

在本步骤中,格式化后的广告数据信息为-i 、广告内容:(图片/flash/文字) 广告主: 一汽大众广告名称:速腾汽车 In this step, the advertisement data formatted as -i, advertising content :( images / flash / text) Advertisers: FAW-Volkswagen ad Name: Super Proton cars

广告目标URL: http:〃www.sagitar.com.cn/olympic/ Advertising goal URL: http: 〃www.sagitar.com.cn / olympic /

投放媒体:新浪汽车频道,爱卡汽车网资讯频道,...... Delivery media: Sina Auto Channel, love card network information channels, ......

ii、广告内容:(图片/flash/文字) ii, advertising content :( images / flash / text)

广告主:上海通用汽车有限公司 Advertisers: Shanghai General Motors Co., Ltd.

广告名称:别克林荫大道汽车 Ad Name: Buick Park Avenue Car

广告目标URL: http:〃topic.xcar.com.cn7buickhistory/ Advertising goal URL: http: 〃topic.xcar.com.cn7buickhistory /

投放媒体:搜狐汽车频道,...... Delivery media: Sohu Auto Channel, ......

步骤303:建立关键词到广告的倒排索引: Step 303: Establish inverted index keywords to the ad:

在本步骤中,建立的倒排索引如图4所示,图4为依照本发明实施例建立的倒排索引的示意图。 In this step, establishing inverted, an inverted index 4 is a schematic diagram of FIG. 4 in accordance with embodiments of the present invention to establish an index FIG.

步骤304:对倒排索引表中的每个关键词进行相关性计算,具体包括: Step 304: For each inverted index table keywords correlation calculation, comprises:

首先确定使用公式P = alxm + a2xc + a3xh计算关键词的相关性,其中al 、 a2和a3是常量系数,且al+a2+a3 = 1 ,在实际运算时al 、 a2和a3 Is first determined using the formula P = alxm + a2xc + a3xh calculating the correlation of the keyword, wherein Al, a2, and a3 are constant coefficients, and al + a2 + a3 = 1, al when the actual operation, a2, and a3

所占的权重可调,m为每个广告的投放的网站/频道信息、c为广告内容描 Share of the weight is adjustable, m is served each ad website / channel information, c is the advertising content description

述信息、h为广告主的信息; Said information, h is the information of the advertiser;

然后确定常量系数al、 a2和a3的取值,al=0.4, a2=0.2, a3=0.4 Is then determined constant coefficients al, a2, and a3 values, al = 0.4, a2 = 0.2, a3 = 0.4

(当然在实际取值的过程中,可以根据排序结果适当进行调整); (Of course, in the actual value, the sort can be appropriately adjusted according to the result);

然后计算m的值:首先从中国互联网协会提供的数据中査找频道的 Then calculate the value of m: first looks from the data channel provided by the Internet Society of China

Traffic Rank值,得到Tr (sina) =148664, Tr (sohu) =100175, Tr (xcar) Traffic Rank value, to obtain Tr (sina) = 148664, Tr (sohu) = 100175, Tr (xcar)

=841,计算得到Tr (Al) =148664+841 = 149505, Tr (A2) =100175, = 841, calculated Tr (Al) = 148664 + 841 = 149505, Tr (A2) = 100175,

进而计算得到 Furthermore calculated

m (Al) =149505/ (149505+100175) =0.5988; m (Al) = 149505 / (149505 + 100175) = 0.5988;

m (A2) =100175/ (149505+100175) =0.4012; m (A2) = 100175 / (149505 + 100175) = 0.4012;

然后计算c 的值:采用空间向量模型C(")- i (l,g2f/("))_xl。 Then calculate the value of c: vector space model using C ( ") - i (l, g2f / (")) _ xl. g2(胁,)进行计算,其中,c(")为词f在 G2 (threat,) is calculated, wherein, c ( ") for the word f

b + l0g2仇力)XI。 b + l0g2 Qiu force) XI. g2 / ", )f g2 / ",) f

文本中的权重,f/(f,》为词f在文本^中的词频,TV为训练文本的 Text weights, f / (f, "is the word in the text word frequency f ^ is, TV is a training text

总数,",为训练文本集中出现f的文本数,分母为归一化因子;由上述空 The total number ", the number of training text focus f text appears, the denominator is a normalization factor; by the air

间向量模型得到:c (Al) =0.5233 ; c (A2) =0.5732; •••••• Vector obtained between the model: c (Al) = 0.5233; c (A2) = 0.5732; ••••••

然后计算h 的值:采用空间向量模型, (l,l0g2f/("))—xIog2(iV/",)进行计算,其中,C(a)为词f在 H then calculated values: The vector space model, (l, l0g2f / ( ")) - xIog2 (iV /",) is calculated, wherein, C (a) the Ci f

[(1+iog2仇力)x iog2 "' )r [(1 + iog2 Qiu force) x iog2 " ') r

文本J中的权重,为词f在文本S中的词频,为训练文本的 J text in weight, is the word in the text word frequency f S, the training text

总数,",为训练文本集中出现f的文本数,分母为归一化因子;由上述空 The total number ", the number of training text focus f text appears, the denominator is a normalization factor; by the air

间向量模型得到:h (Al) =0.4817 ; h (A2) 二0.5112;…… Between the vector model obtained: h (Al) = 0.4817; h (A2) two 0.5112; ......

最后釆用公式P = alxm + a2xc + a3xh综合计算关键词的相关性:P (Al) =0.4x0.5988+0.2x0.5233+0.4x0.4817 = 0.5369; P (A2) =0.4x0.4012+0.2x0.5732+0.4x0.5112 = 0.4796; Finally, preclude the use of the formula P = alxm + a2xc + a3xh calculated integrated correlation keyword: P (Al) = 0.4x0.5988 + 0.2x0.5233 + 0.4x0.4817 = 0.5369; P (A2) = 0.4x0.4012 + 0.2x0.5732 + 0.4x0.5112 = 0.4796;

步骤305:根据计算出的结果,对计算出的相关性从高到低进行排序, Step 305: according to the calculated results, the calculated correlation descending sort,

排序结果如下- Sort results were as follows -

P (Al) >P (A2) >...... P (Al)> P (A2)> ......

即最后利用"汽车"这个关键词,得到的广告排序如图5所示,图5为依照本发明实施例对网络广告进行排序的结果示意图,这是选取了头两条的结果。 Finally, i.e., "car" keyword, advertising sort obtained as shown in FIG 5, FIG. 5 is a schematic view of an embodiment according to the present invention result of sorting network ads, which is the result of selecting the first two.

以上所述的具体实施例,对本发明的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本发明的具体实施例而己,并不用于限制本发明,凡在本发明的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。 The above-described specific embodiments of the object, technical solutions, and advantages of the invention will be further described in detail, it should be understood that the foregoing descriptions are merely specific embodiments of the present invention and has, according to the present invention is not intended to limit the , within the spirit and principle of the present invention, any modifications, equivalent replacements, improvements, etc., should be included within the scope of the present invention.

Claims (7)

1、一种对网络广告进行排序的方法,其特征在于,该方法包括: 利用广告监控程序从网站获取广告数据,从获取的广告数据中提取出广告主的信息和广告描述信息,该步骤具体包括: 利用广告监控程序蜘蛛spider监控各个网站的广告投放情况,并将原始网页内容作为网页快照保存到网页快照库中; 对网页快照库中保存的广告数据中的文字信息进行分词,得到一组文本向量; 根据所述文本向量的特征,对所述文本向量进行向量加权或向量减权; 采用空间向量模型计算所述进行了向量加权或向量减权后的文本向量的权重; 对计算出来的文本向量的权重进行排序,并根据文本向量所在网页中的上下文信息,从网页中提取出广告主的信息和广告描述信息; 分别对广告主的信息和广告描述信息进行分词得到关键词,建立该关键词的索引; 计算每个建立索 1, the network advertisement method of sorting, characterized in that, the method comprising: monitoring program acquires advertisement data from the advertisement website extracted advertiser and advertisement information from the advertisement data acquired description information, the specific step including: use of advertising monitoring program spider spider monitor advertising each site at once, and original web content saved as a web page snapshot web page snapshot repository; for text messages advertising data saved web page snapshot repository of word segmentation, resulting in a set text vector; according to the text feature vectors, the weighting vector or vector text to minus right; vector space model using the calculated weight vector or a weight was the weight of the text vector minus weight; calculated for right text vector is re-sorted, and the context information page where the text vector extracted advertiser information and advertisement description information from the web page; respectively advertiser information and the advertisement description information word obtained keyword, the establishment of the keyword index; calculated for each cable to establish 的关键词的相关性,按照计算的相关性从高到低对网络广告进行排序。 Keyword relevance, in accordance with the correlation computing descending sort of online advertising.
2、 根据权利要求1所述的对网络广告进行排序的方法,其特征在于, 所述对网页快照库中保存的广告数据中的文字信息进行分词包括:将现代汉语的普通字序列文本分解为词序列的文本。 2. The Internet advertising method according to sort claimed in claim 1, wherein said word dividing advertisement data comprises character information stored in the web page snapshot library: ordinary text word sequence is decomposed into modern Chinese text word sequence.
3、 根据权利要求1所述的对网络广告进行排序的方法,其特征在于, 所述对文本向量进行向量加权或向量减权的步骤包括:对出现在标题中的文本向量,将向量权重增至原来的5至10倍; 对出现在网页结构中内容content的简介的文本向量,将向量权重增至原来的2至3倍;对出现在网页内容中版权信息类的文本向量,将向量权重增至原来的3至5倍;对出现在网页内容中与广告主的信息有关的文本向量,将向量权重增至原来的3至5倍;对出现的包含在停词表中的文本向量,将向量权重减至原来的1/5至1/10。 3, the network advertisement method according to the sort of claim 1, wherein said weight vector for text or vector minus the right step comprises: the text in the title of the vector, the vector weight gain to the original 5 to 10 times; text vectors appear in the web structure, the content of content profile of the vector weights increased to the original 2 to 3 times; it appears in the page content text vector copyright information class, the vector weight increased to 3-5 times the original; vector text appears on the page content and information related to the advertiser, the vector weight increased to 3-5 times the original; vector contains the text in the stop word list that appears, the vector weights reduced to 1/5 to 1/10.
4、根据权利要求1所述的对网络广告进行排序的方法,其特征在于,所述空间向量模型采用以下公式来表征:<formula>formula see original document page 3</formula>其中,『G,》为词/在文本S中的权重,而r/"》为词f在文本S中的词频,W为训练文本的总数,n,为训练文本集中出现f的文本数, 分母为归一化因子。 4. The Internet advertising method according to sort claimed in claim 1, wherein said vector space model is characterized by the following formula: <formula> formula see original document page 3 </ formula> where "G, "for the word / right in the text S is heavy, and r /" "is the word f in the text S in the word frequency, W is the total number of training text, n, for the training of text f text concentrated there, the denominator is the normalization factor.
5、 根据权利要求1所述的对网络广告进行排序的方法,其特征在于, 所述对计算出来的文本向量的权重进行排序时,首先设定一个阈值,将权重大于该阈值的文本向量挑选出来构成一个集合,然后再根据所在网页中的上下文信息,从所述集合中提取出需要的广告主的信息和广告描述信>K、 o 5. The method of online advertising to sort according to claim 1, wherein said pair of weight calculated text vector is re-sorted, first setting a threshold value, the weight is greater than text vector the threshold selection it constitutes a set, then the page where the context information is extracted from the set described in the letter information and advertising advertiser needs> K, o
6、 根据权利要求1所述的对网络广告进行排序的方法,其特征在于, 所述计算关键词的相关性的步骤中采用公式P = alxm + a2xc + a3xh来计算关键词的相关性,其中al、 a2和a3是常量系数,且al+a2+a3-l,在实际运算时al、 a2和a3所占的权重可调,m为每个广告的投放的网站/ 频道信息、c为广告内容描述信息、h为广告主的信息,具体计算过程包括:根据实际情况确定al、 a2和a3的值,然后分别计算m、 c和h的值, 将al、 a2、 a3、 m、 c和h的值带入公式P-alxm + a2xc + a3xh计算得到关键词的相关性。 6, the network advertisement method according to the sort of claim 1, wherein the step of computing correlation keywords used in the formula P = alxm + a2xc + a3xh keyword calculated correlation, wherein al, a2 and a3 are constant coefficients, and al + a2 + a3-l, in the actual calculation al, a2, and a3 share of the weight is adjustable, m is served ads per site / channel information, c is ad content description information, h is the information of the advertiser, the details of the calculation comprising: determining the actual situation Al, values ​​a2, and a3, and then calcd m, c and h, respectively, the al, a2, a3, m, c, and h values ​​into the formula + a2xc + a3xh keyword correlation calculated P-alxm.
7、 根据权利要求6所述的对网络广告进行排序的方法,其特征在于, 计算广告内容描述信息c的值和计算广告主的信息h的值采用下面的空间向量模型进行:<formula>formula see original document page 3</formula>其中,C(fj)为词f在文本J中的权重,<formula>formula see original document page 3</formula>为词f在文本J 中的词频,W为训练文本的总数,^为训练文本集中出现/的文本数,分母为归一化因子。 7, the network advertisement method according to claim sorting according to claim 6, wherein calculating ad description information c and the values ​​calculated advertiser information h using a vector space model, the following: <formula> formula see original document page 3 </ formula> where, C (fj) Ci f right text J in the heavy, <formula> formula see original document page 3 </ formula> Ci f term frequency in the text in J, W is the total number of training text ^ for the centralized training text appear / text number, the denominator is a normalization factor.
CN 200710117607 2007-06-20 2007-06-20 Process for ordering network advertisement CN100458797C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200710117607 CN100458797C (en) 2007-06-20 2007-06-20 Process for ordering network advertisement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200710117607 CN100458797C (en) 2007-06-20 2007-06-20 Process for ordering network advertisement

Publications (2)

Publication Number Publication Date
CN101097580A CN101097580A (en) 2008-01-02
CN100458797C true CN100458797C (en) 2009-02-04

Family

ID=39011412

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200710117607 CN100458797C (en) 2007-06-20 2007-06-20 Process for ordering network advertisement

Country Status (1)

Country Link
CN (1) CN100458797C (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8620745B2 (en) 2010-12-27 2013-12-31 Yahoo! Inc. Selecting advertisements for placement on related web pages
WO2015187485A1 (en) * 2014-06-03 2015-12-10 Google Inc. Systems and methods of generating notifications
CN104360994A (en) * 2014-12-04 2015-02-18 科大讯飞股份有限公司 Natural language understanding method and natural language understanding system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1402156A (en) 2001-08-22 2003-03-12 威瑟科技股份有限公司 Web site information extracting system and method
CN1227611C (en) 2001-03-09 2005-11-16 北京大学 Method for judging position correlation of a group of query keys or words on network page
CN1816812A (en) 2003-06-02 2006-08-09 Google公司 Serving advertisements using user request information and user information
CN1862530A (en) 2005-05-13 2006-11-15 赵然 Network search engines
CN1932817A (en) 2006-09-15 2007-03-21 陈远 Common interconnection network content keyword interactive system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1227611C (en) 2001-03-09 2005-11-16 北京大学 Method for judging position correlation of a group of query keys or words on network page
CN1402156A (en) 2001-08-22 2003-03-12 威瑟科技股份有限公司 Web site information extracting system and method
CN1816812A (en) 2003-06-02 2006-08-09 Google公司 Serving advertisements using user request information and user information
CN1862530A (en) 2005-05-13 2006-11-15 赵然 Network search engines
CN1932817A (en) 2006-09-15 2007-03-21 陈远 Common interconnection network content keyword interactive system

Also Published As

Publication number Publication date
CN101097580A (en) 2008-01-02

Similar Documents

Publication Publication Date Title
US8386914B2 (en) Enhanced document browsing with automatically generated links to relevant information
AU2004279095B2 (en) Automatically targeting web-based advertisements
CA2593421C (en) Location extraction
CN102124462B (en) Query identification and association
KR100797401B1 (en) Methods and apparatus for serving relevant advertisements
US9535911B2 (en) Processing a content item with regard to an event
CN101636737B (en) Blending mobile search results
US20070185858A1 (en) Systems for and methods of finding relevant documents by analyzing tags
US20070067294A1 (en) Readability and context identification and exploitation
US20060064411A1 (en) Search engine using user intent
US20090287676A1 (en) Search results with word or phrase index
CN102625936B (en) Query suggestions from the document
US20070250501A1 (en) Search result delivery engine
US20070239676A1 (en) Method and system for providing focused search results
Bennett et al. Inferring and using location metadata to personalize web search
CN101436186B (en) Method and system for providing related searches
Bilenko et al. Mining the search trails of surfing crowds: identifying relevant websites from user activity
JP2004348241A (en) Information providing method, server, and program
US8762392B1 (en) Query suggestions for a document based on user history
JP2011238276A (en) Ranking blog documents
WO2008109257A1 (en) Detecting a user&#39;s location, local intent, and travel intent from search queries
WO2004010331A1 (en) System and method for automated mapping of keywords and key phrases to documents
CN101124081A (en) Reputation based search
US20100114699A1 (en) Content identification expansion
WO2006122059A2 (en) System and methods for identifying the potential advertising value of terms found on web pages

Legal Events

Date Code Title Description
C06 Publication
C10 Request of examination as to substance
C14 Granted
COR Bibliographic change or correction in the description

Free format text: CORRECT: ADDRESS; FROM: 100085 HAIDIAN, BEIJING TO: 100090 HAIDIAN, BEIJING

C41 Transfer of the right of patent application or the patent right
ASS Succession or assignment of patent right

Owner name: BEIJING JINGSHI WANWEI TECHNOLOGY CO., LTD.

Free format text: FORMER OWNER: BEIJING KKEYE CO., LTD.

Effective date: 20120217

ASS Succession or assignment of patent right

Owner name: BEIJING KKEYE CO., LTD.

Free format text: FORMER OWNER: BEIJING JINGSHI WANWEI TECHNOLOGY CO., LTD.

Effective date: 20150204

C41 Transfer of the right of patent application or the patent right
COR Bibliographic change or correction in the description

Free format text: CORRECT: ADDRESS; FROM: 100090 HAIDIAN, BEIJING TO: 100085 HAIDIAN, BEIJING