CN103049474A - Search query and document-related data translation - Google Patents

Search query and document-related data translation Download PDF

Info

Publication number
CN103049474A
CN103049474A CN2012104134805A CN201210413480A CN103049474A CN 103049474 A CN103049474 A CN 103049474A CN 2012104134805 A CN2012104134805 A CN 2012104134805A CN 201210413480 A CN201210413480 A CN 201210413480A CN 103049474 A CN103049474 A CN 103049474A
Authority
CN
China
Prior art keywords
translation
query
search
model
phrase
Prior art date
Application number
CN2012104134805A
Other languages
Chinese (zh)
Inventor
高剑峰
威廉·多兰
克里斯托弗·布罗克特
王正灏
李玫
黄学东
Original Assignee
微软公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to US201161551363P priority Critical
Priority to US61/551,363 priority
Priority to US13/328,924 priority
Priority to US13/328,924 priority patent/US9501759B2/en
Application filed by 微软公司 filed Critical 微软公司
Publication of CN103049474A publication Critical patent/CN103049474A/en

Links

Abstract

The subject disclosure is directed towards developing a translation model for mapping search query terms to document-related data. By processing user logs comprising search histories into word-aligned query-document pairs, the translation model may be trained using data, such as probabilities, corresponding to the word-aligned query-document pairs. After incorporating the translation model into model data for a search engine, the translation model is used may used as features for producing relevance scores for current search queries and ranking documents/advertisements according to relevance.

Description

搜索查询和文档相关数据翻译[0001] 相关申请的交叉引用[0002] 本申请要求在2011年10月25日提交的序列号为61/551,363的美国临时专利申请、以及在2011年12月16日提交的第13/328924号美国专利申请的优先权。 Search query data and documents related to the translation [0001] Cross-REFERENCE TO RELATED APPLICATIONS [0002] This application claims the benefit of US Provisional Patent Application Serial No. 25 October 2011, filed 61 / of 551,363 and in December 2011 priority to US Patent application No. 13/328924, filed 16. 背景技术[0003] 搜索互联网以定位相关的文档和广告会是挑战性的,这是因为搜索查询和web (网页)文档/广告往往使用不同的语言风格和词汇。 Related Art [0003] searching the Internet to locate relevant documentation and advertising will be challenging, because the search query and web (web) documents / advertisements tend to use different styles of language and vocabulary 存在与目前的互联网搜索技术有关的各种问题。 There are various problems with the current Internet search technology related. 通常,查询包含与相关文档中的术语不同但有关的术语,这导致了被称为词汇空缺问题的公知的信息检索问题。 Typically, the query contains the related documents in terms of different but related terminology, which leads to well-known problems of information retrieval is called lexical gap problem. 有时,当查询包含具有导致含糊不清的多重含义的术语时,搜索引擎检索到与用户的意图不匹配的许多文档,这可以称为嘈杂扩散(noisy proliferation)问题。 Sometimes, when the query contains the term has multiple meanings lead to ambiguity, and search engines to many documents do not match the user's intention, which can be called noisy proliferation (noisy proliferation) problem. 由于搜索查询和web文档是由各种各样的人用非常不同的语言风格编著的这一事实,这两个问题在互联网搜索中实质上更为普遍。 Since search queries and web documents by all sorts of people very different style of language edited by the fact that these two issues in Internet search substantially more prevalent. [0004] 研究团体所开发的典型信息检索方法(不管其在基准数据集(例如,文本检索会议(TREC)集合)上的现有技术性能如何)基于词袋和精确术语匹配方案,并且不能有效地处理这些问题。 [0004] Typical information retrieval method developed by the research community (how the prior art regardless of the performance in the reference data set (e.g., text retrieval session (the TREC) set)) and a bag of words based on exact matching term program, and can not effectively to address these issues. 一些方法采用趋向于使嘈杂扩散问题更糟糕的特别(ad-hoc)措施。 Some methods tend to employ noisy proliferation particularly worse (ad-hoc) measures. 虽然已经提出了数种方法来确定查询中的术语与文档中的术语之间的关系,但是这些方法中的大多数依赖于基于术语在查询和文档中同现的术语相似度(如余弦相似度)的不适当措施。 Although to determine the relationship between the query term and the document terms it has been proposed several methods, but these methods are most dependent on the terms in the query and the document with the current terms of similarity (eg cosine similarity based ) inappropriate measures. 例如,在付费搜索系统中,期望定位与搜索查询有关的且具有潜在用户关注的文档(其可以包括广告),由此用户将更有可能点击它们,然而,由于由文档内容和搜索查询之间的语言差异而引起的词汇空缺问题和/或嘈杂扩散问题,已知的技术通常返回无关的文档。 For example, paid search system, search queries related to the desired location and potential users concerned with a document (which may include advertising), so users will be more likely to click on them, however, as the document content and search queries between the vocabulary vacancy problems and / or noisy proliferation caused by differences in language, known techniques typically return the document irrelevant. 发明内容[0005] 提供本发明内容来以简化的形式介绍代表性构思的选择,下面在具体实施方式中进一步对其进行描述。 SUMMARY OF THE INVENTION [0005] This Summary is provided to introduce concepts in a simplified form representative of choice, be described further below in the Detailed Description. 本发明内容既不意在表明所要求保护的主题的关键特征或基本特征,也不意在以会限制所要求保护的主题的范围的任何方式来进行使用。 This Summary is not intended to indicate the subject matter of the claimed key features or essential features, nor is it intended in any way to limit the claimed subject matter to the scope of use. [0006] 简单地说,文中所描述的主题的各个方面针对于常见语言(例如,英语)的子语言之间的文档和搜索查询翻译模型。 [0006] Briefly, various aspects of the subject matter described herein are directed to a common language (eg, English) between the sub-language documents and search query translation model. 在一个方面,开发用于将搜索查询术语映射到文档相关数据(诸如广告描述)的翻译模型涉及:构建包括词对齐的查询-文档对的词对齐训练语料库。 In one aspect of the translation model, developed for mapping the search query terms in the document related data (such as your description) relate to: build a query includes word alignment - a document on a word-aligned training corpus. 在一个方面,可以使用已记录的搜索历史来生成训练语料库,已记录的搜索历史包括源于搜索查询的点击事件。 In one aspect, you can use search history has been recorded to generate training corpus, recorded search history, including search queries from the click event. 对于每一对,可以假定给定的搜索查询翻译成点击过的文档标题或广告描述,这是因为用户不会选择无关的文档或广告。 For each pair, it can be assumed given search query into the document title or clicked on the ad description, because the user does not select a document or irrelevant advertising. 在针对每个查询-文档对确定文档相关词与查询术语之间的词对齐(例如查询术语与文档相关词/短语之间的映射,如一对一映射)之后,估计词对齐中特定的文档相关词与相应的查询术语之间的翻译概率。 For each query in - alignment (such as mapping between query terms and documents related words / phrases, such as one to one mapping) to determine word document between documents related to the query term after term, the estimated word alignment in a specific document related translation probability between the word and the corresponding query term. 这些翻译概率可以由部署到互联网的搜索引擎使用。 These probabilities can be translated by a deployment to the Internet search engines. [0007] 在另一方面,搜索引擎的训练机构可以生成词对齐训练语料库并识别查询-广告双语短语(即,双短语(b1-phrase))。 [0007] On the other hand, the search engine training institutions can generate word-aligned training corpus and identify queries - Ad bilingual phrase (ie, two-phrase (b1-phrase)). 训练机构可以计算与查询-广告双短语相关联的短语翻译概率,并产生针对广告的基于短语的查询翻译概率,这些基于短语的查询翻译概率被提供给搜索引擎,用于基于搜索查询是否可以从与这样的文档有关的数据生成或翻译出而对文档进行排名。 Training institutions can be calculated with the inquiry - Dual phrase translation probability advertising phrase associated with the ad and generate queries for the phrase-based translation probability, these queries phrase-based translation probability is provided to a search engine, based on whether the search query from data relating to such document generation and document translated or rank. 在另一方面,搜索引擎提供方可以使用基于短语的翻译模型,以通过关于更好的关键词、所建议的描述等的信息来支持广告客户。 On the other hand, the search engine provider can use phrase-based translation model, in order to pass on better keywords, description and other information to support the proposed advertiser. [0008] 根据结合附图进行的以下详细描述,其他的优点会变得明显。 [0008] In the following detailed description in conjunction with the accompanying drawings, other advantages will become apparent. 附图说明[0009] 本发明通过举例的方式来说明并且不限于附图,在附图中,相似的附图标记表示相似的元件,并且在附图中:[0010] 图1是示出了根据一个示例性实施方式的用于搜索查询和文档相关数据翻译的示例性系统的框图。 BRIEF DESCRIPTION [0009] The present invention will be described by way of example and not limitation in the accompanying drawings, in which like reference numerals denote like elements, and in which: [0010] FIG. 1 is a diagram showing a block diagram of an exemplary system queries and documents related to the translation data in accordance with an exemplary embodiment for searching. [0011] 图2是示出了根据一个示例性实施方式的用于翻译模型训练的示例性流水线的框图。 [0011] FIG. 2 is a block diagram illustrating an exemplary pipeline for training a translation model according to an exemplary embodiment. [0012] 图3是示出了根据一个示例性实施方式的用于付费广告搜索的示例性运行时间数据流的框图。 [0 012] FIG. 3 is a block diagram illustrating paid search advertising, according to one exemplary embodiment of the exemplary embodiment runtime data streams. [0013] 图4是示出了根据一个示例性实施方式的开发用于将搜索查询术语与广告相关数据进行映射的基于短语的翻译模型的示例性步骤的流程图。 [0013] FIG 4 is a flowchart illustrating an exemplary embodiment of the development embodiment for the search query terms associated with the advertisement based on the data mapping exemplary steps of a phrase translation model. [0014] 图5是表示可以实施文中所描述的各个实施例的示例性的非限制性网络化环境的框图。 [0014] FIG. 5 is a block diagram illustrating various exemplary non-limiting networked environment embodiments described herein may be implemented. [0015] 图6是表示可以实施文中所描述的各个实施例的一个或更多个方面的示例性的非限制性计算系统或运行环境的框图。 [0015] FIG. 6 is a block diagram of one embodiment of the various embodiments described herein or exemplary non-limiting computing system or operating environment more aspects may be implemented. 具体实施方式[0016] 文中所描述的技术的各个方面一般针对搜索查询和文档相关数据翻译。 DETAILED DESCRIPTION [0016] Various aspects of the technology described herein generally translate documents for the search query and related data. 文档相关数据可以包括广告着陆页、广告描述和/或文档标题等。 Documents related data may include advertising landing pages, descriptions and / or document title and so on. 在生成使用或不使用对齐模板来捕获搜索查询部分与文档部分之间的语义相似度的翻译模型之后,翻译模型可以结合到搜索引擎的模型数据中。 After generating the alignment template with or without translation model to capture the semantic similarity between the search query part of the document part, the translation model can be incorporated into the model data search engine. 在搜索引擎被部署的情况下,当基于搜索查询是否可以从文档相关数据翻译出而将搜索查询映射到一个或更多个相关文档时,翻译模型可以用作特征信息的源。 In the case of a search engine is deployed, when based on whether the search query can be translated from the document data and to map a search query to one or more of the relevant documents, the translation model can be used as a source characteristic information. [0017] 应当理解,文中的任何示例均为非限制性的。 [0017] It should be understood that any of the examples herein are non-limiting. 如此,本发明并不局限于文中所描述的任何具体实施例、方面、构思、结构、功能或示例。 Thus, the present invention is not restricted to any particular embodiment described in the foregoing embodiments, aspects, concepts, structures, functionalities or examples described. 相反,文中所描述的任何实施例、方面、 构思、结构、功能或示例为非限制性的,并且可以按照总体上在计算和搜索中提供益处和优点的各种方式来使用本发明。 Instead, any of the embodiments, aspects, concepts, structures, functionalities or examples described herein are non-limiting, and may be in various ways and generally provide benefits and advantages in computing the present invention is used in the search. [0018] 图1是示出了根据一个示例性实施方式的用于文档和搜索查询翻译的示例性系统的框图。 [0018] FIG. 1 is a block diagram illustrating an exemplary system and search for documents according to an exemplary embodiment of the exemplary embodiment of the query translation. 该示例性系统的组件可以包括:使用数据102、训练机构104、模型数据106、搜索引擎提供方108和示例性用户110。 The components of the exemplary system may include: using the data 102, the training mechanism 104, model data 106, 108 and search engine providers exemplary user 110. 应理解,示例性用户110表示搜索引擎用户群体中的任何用户。 It should be understood, exemplary user search engine 110 represents any user in the user population. 当示例性用户110通过本地计算设备传送搜索查询时,示例性搜索引擎采用来自模型数据106的各种模型以用如文中所述的搜索结果来响应搜索查询。 When the exemplary user computing device 110 queries transmitted by the local search, example search engine using a variety of models from the model data 106 in response to a search query as described herein with the search results. 在使用数据102 累积一段时间之后,训练机构104对使用数据102进行分析并生成一个或更多个模型,所述一个或更多个模型随后被部署到搜索引擎提供方108以作为模型数据106的更新。 After a period of use of the accumulated data 102, usage data train mechanism 104, and 102 are analyzed to generate one or more models of the one or more model is then deployed to the search engine provider 108 as data 106 of the model update. 学习如何组合多个模型以识别相关文档可以以离线的方式执行。 Learn how to combine multiple models to identify relevant documents can be performed in offline mode. [0019] 根据一个实施方式,使用数据102可以包括在特定时间段(例如,一年)内收集的、 与多个搜索引擎用户相关联的聚合的搜索历史。 [0019] According to one embodiment, the usage data 102 may include a plurality of polymerized in a particular time period (e.g., one year) collected in the search engine associated with the user's search history. 使用数据102可以包括已记录的搜索查询、相关搜索结果以及源于搜索查询的点击事件,并对应于具有统一资源定位符(URL)的文档(包括广告)。 102 may include the use of data recorded search query, relevant search results from a search query and click on the event and corresponding to the document has a Uniform Resource Locator (URL) of (including advertising). 使用数据102还可以包括文档相关数据,诸如文档标题和/或广告关键词和描述等。 Usage data 102 may also include a document-related data, such as the document title and / or advertisement keyword and description. [0020] 训练机构104可以利用用于计算搜索查询子语言与文档/广告子语言之间的翻译概率的各种数据,诸如对齐模板112和/或词对齐训练语料库114。 [0020] Training mechanism 104 may utilize a variety of data used for calculating the search query translation probabilities between sub-language document / ad sublanguage, such as the alignment template 112 and / or 114 word-aligned training corpus. 要理解的是,虽然这些翻译概率的示例性实施例涉及常见语言(诸如英语),但是每个概率是指经常在信息检索系统内出现的、不同词或短语之间的词汇空缺。 It is understood that, although these exemplary embodiments relate to a common language translation probability (such as English), but refers to the probability of each lexical gap between often appear in the information retrieval system, a different word or phrase. 搜索查询术语可以映射到具有相同或类似含义的不同术语和/或映射到在各种文档/广告中传达的多个含义。 Search query terms can be mapped to different terms have the same or similar meaning and / or mapped to multiple meanings conveyed in various documents / advertisements. [0021] 例如,响应于针对“jogging shoes (慢跑鞋)”的搜索查询,示例性搜索引擎可能不将包括短语“running shoes (跑鞋)”的广告识别为相关的,或者替选地,可能将该广告分类为具有低相关性,即便这两个短语共享语义关系。 [0021] For example, in response to a search for "jogging shoes (running shoes)" query, the example search engine may not to include the phrase "running shoes (shoes)" ads identified as related to, or alternatively, may be the ad is classified as having a low correlation, even if these two phrases shared semantic relations. 为了修补这样的词汇空缺,相应的翻译概率捕获这两个短语之间的语义关系或相似度。 To repair such words vacancies corresponding translation probability of capturing the semantic relationship between these two phrases or similarity. 在一个实施方式中,相应的翻译概率包括下述值:该值表示短语“running shoes”可以从“jogging shoes”翻译出以及短语“jogging shoes”可以从“running shoes”翻译出的机率,并由此表示广告与搜索查询的相关性如何。 In one embodiment, the respective translation probability value comprises: This value represents the phrase "running shoes" can be translated and the phrase "jogging shoes" from "jogging shoes" probability can be translated from "running shoes", by this shows how the relevance of the ad to the search query. [0022] 为了确定词或短语是否共享语义关系,根据一个示例性实施方式,训练机构104 提取搜索查询术语和与点击事件相关联的文档相关数据。 [0022] In order to determine whether a word or phrase shared semantic relations, according to an exemplary embodiment, training institutions 104 extracts a search query terms and documentation associated with the click event-related data. 在构建词对齐之后,训练机构104 将提取的数据转换成词对齐训练语料库114,词对齐训练语料库114包括用作搜索查询术语和/或文档相关数据的词或短语的词对齐的查询-文档对。 After building the word alignment, the training mechanism 104 converts the extracted data into a word-aligned training corpus 114, 114 includes a training corpus word alignment as a search query terms and / or phrases and words of the document or data queries aligned - on documents . 在一个实施方式中,训练机构104可以使用词对齐训练语料库114来产生对齐模板112,对齐模板112可以包括这些词或短语的广义版本。 In one embodiment, the training word alignment mechanism 104 may be used to generate a training corpus 114 alignment template 112, the template 112 may include an alignment generalized version of these words or phrases. [0023] 对齐模板112可以提供使用通用词类(例如,共享语义关系的词分组)而不是实际词的替代的词对齐。 [0023] alignment template 112 may be provided using a common part of speech (e.g., words shared semantic relationship packet) rather than the actual words of the alternative word aligned. 在一个实施方式中,与该示例性搜索引擎相关联的一个或更多个特征(功能)可以使用对齐模板112,以响应于搜索查询来对文档/广告进行排名。 In one embodiment, the exemplary search engine associated with the one or more features (functions) may use the alignment template 112, in response to a search query to rank the document / ad. 每个特征可以将搜索查询分割成将搜索查询术语映射到文档相关数据(诸如文档/广告关键词)的对齐模板112的子集,并且产生与其他值相结合以形成特征信息(例如形成加权平均)的值(诸如相关性得分或相关性得分的矢量)。 Each feature may be divided into the search query to map the search query terms in the document-related data (such as a document / advertisement keyword) is aligned with a subset of template 112, and generates combined with other values ​​to form the feature information (for example, a weighted average ) value (such as relevance scores or score associated vectors). 要理解的是,可以采用许多其他特征来计算特定文档/ 广告的相关性得分,诸如语言结构(例如,与广告标题/描述的良好形成质量有关的值)、对齐模板子集的数量或排序等。 Is to be understood that many other features may be employed to calculate the correlation score particular document / advertising (value related to the good formation of quality, for example, the headline / description), such as a language structure, the number of the alignment template of a subset of sorting . [0024] 在一个实施方式中,训练机构104可以通过基于词对齐训练语料库114生成使用数据102中的先前记录的搜索查询术语与文档相关数据之间的映射信息,来构建翻译模型116。 [0024] In one embodiment, the training mechanism 104 may be based on mapping information between the word alignment training corpus 114 102 generated using previously recorded data search query terms associated with the document data, to construct a translation model 116. 映射信息可以包括适合于词对齐训练语料库114的各种概率,诸如除了基于词的翻译概率和/或基于短语的翻译概率之外的查询映射概率。 Mapping information may include a variety of training suitable for word alignment probability corpus 114, a query such as probability maps in addition to translation probability-based word and / or phrase-based translation probability of. 训练机构104可以采用期望最大化技术来收敛(例如训练)基于词或基于短语的翻译概率以基本上与查询-文档对匹配,以及最大化每个文档对的查询映射概率。 104 training institutions can use technology to maximize the expected convergence (such as training) to substantially based on a word or phrase-based translation probability and queries - matching, as well as maximizing the probability of each document query mapping document. 查询翻译概率可以表示从给定的文档的一个或更多个部分(诸如广告描述或文档标题)生成搜索查询的条件概率。 Query translation probability can be expressed conditional probability to generate search queries from a given document of one or more parts (such as advertising or describe the document title) in. 如文中所描述的,示例性搜索引擎可以使用查询翻译概率作为未处理(pending)的搜索查询与潜在搜索结果之间的正确翻译或映射的似然性。 As described herein, the exemplary search engine may use the query as a translation probability or likelihood of proper translation of mapping between untreated (Pending) potential search query with the search results. [0025] 在一个示例性实施方式中,训练机构104可以将翻译模型116结合到模型数据106 中,用于由示例性搜索引擎使用。 [0025] In one exemplary embodiment, the mechanism 104 training a translation model 116 can be incorporated into the data model 106, for use by the example search engine. 例如,训练机构104可以通过内插(例如,线性或对数-线性内插)将基于词的翻译模型与语言模型(诸如一元语言模型)相结合。 For example, training mechanism 104 can be inserted (e.g., linear or logarithmic - linear interpolation) by the translation of the word based on the model and the language model (such as a unigram language model) combined. 要理解的是,翻译模型116可以与任意η元语言模型(诸如二元、三元或四元模型)相结合。 It is to be understood that the meta-language translation model 116 can model (such as a binary, ternary or quaternary model) in combination with any η. 作为另一个示例,训练机构104可以将翻译模型116结合到(线性或非线性)排名模型框架中,在所述排名模型框架中,基于短语的翻译模型和/或基于词的翻译模型可以响应于搜索查询来产生用于对文档/广告排名的各种特征,如文中所述的。 As another example, the training mechanism 104 may be coupled to a translation model 116 (linear or nonlinear) model framework ranking, the ranking in the model frame, the phrase translation model and / or a word-based translation model is based may be responsive to search query to produce a variety of features for document / ad position, as described in the text. 线性排名模型框架还可以使用用于不同特征的其他模型。 Ranking linear model framework may be used for implementing different features of other models. 替选地,训练机构104可以将翻译模型116存储在模型数据106中,用于(例如在不与其他模型相结合的情况下)在对文档/广告排名中直接使用。 Alternatively, mechanism 104 may be training a translation model 116 stored in the data model 106 for (e.g., without combining with other models) to be used directly in the document / ad positions. [0026] 在训练机构104将翻译模型116结合到模型数据106中之后,示例性搜索引擎(如搜索引擎118)可以使用翻译概率以协助搜索查询和文档映射。 [0026] The mechanism 104 in training a translation model 116 is coupled to the model data after 106, the example search engine (e.g., a search engine 118) the translation probability can be used to assist in the search query and a document mapping. 为了产生响应于当前搜索查询而列出的可能相关且有用的搜索结果,搜索引擎118采用各种机构(如相关性机构120 和/或预测机构122)来对诸如广告的文档集合进行识别和适当排名。 May be relevant and useful to the current generated in response to a search query and the search results are listed in the search engine 118 they use various mechanisms (e.g., 120 and / or the prediction mechanism correlation means 122) to a set of ads to a document, such as identifying and properly rankings. [0027] 在一个实施方式中,相关性机构120可以使用各种特征信息124来过滤文档集合, 各种特征信息124可以使用模型数据106而产生。 [0027] In one embodiment, the correlation means 120 may be used to filter various features of the document set information 124, various features of the information model 124 may be used to generate data 106. 例如,相关性机构120可以基于由翻译模型116提供的翻译概率来计算相关性得分/值。 For example, correlation means 120 may be calculated based on the probability of the translation provided by the translation model 116 relevance score / value. 针对当前搜索查询具有最高翻译概率的文档也可以具有存在相关性的最高似然性。 Documentation for the current search query that has the highest probability of translation may also have the highest likelihood of the existence of a correlation. 相关性机构120可以将这些得分与排名数据126 进行比较并去除低于阈值的文档。 Correlation means 120 may be the scores and ranking data 126 and compares the document is removed below a threshold value. [0028] 预测机构122也可以使用特征信息124,针对每个剩余文档确定点击预测得分(诸如点进率(click-through rate))。 [0028] The mechanism 122 may be predicted using the feature information 124, the prediction determination score click (such as a click through rate (click-through rate)) for each remaining document. 例如,预测机构122可以向下述文档(诸如广告)分配最闻点进率:所述文档具有在给定当如搜索查询的情况下被点击的最闻后验概率,和/或具有如翻译模型116所提供的相关性的最高似然性。 For example, the prediction mechanism 122 may be assigned to the following document (such as an advertisement) is most audible click through rate: After the document has heard most posterior probability given as a case where the clicked search query, and / or as having a translation model highest correlation 116 provided likelihood. 作为另一个示例,最高点进率可以取决于各种其他特征,如文档在搜索结果页上的位置、文档相关数据(例如,广告标题/描述)的可读性。 As another example, the highest feed rate may depend on various other features, such as the location of the document on the search results page, the readability of the document-related data (e.g., advertising title / description) of. 预测机构122可以采用神经网络排序器,神经网络排序器集成了大量特征,以预测如果广告在搜索结果页中显示,则其会有多大可能被点击。 Forecasters 122 can be used to sort the neural network, neural network integrated ordering a large number of features to predict if the ads displayed on the search results page, then it will be much likely to be clicked. 具有超过预定义阈值的点进率的文档集合将被存储在排名数据126中并最终呈现给用户110。 Having a set of documents point feed rate exceeds a predefined threshold value will be ultimately presented to the user and stored in the ranking data 126 110. [0029] 在一个示例性实施方式中,搜索引擎提供方108还可以提供一个或更多个软件组件/工具(如建议机构128),以辅助广告客户开发导致更高点进率的广告。 [0029] In one exemplary embodiment, the search engine provider 108 may also provide one or more software components / tools (as suggested mechanism 128), to assist in the development of lead advertiser higher click-through rate of the ad. 在一个示例性实施方式中,建议机构128可以产生用于改进广告收益的策略130,其包括在描述或标题中使用以提高排名的一个或更多个关键词/短语。 In one exemplary embodiment, the proposed mechanism can produce 128 130 strategy for improving ad revenue, which comprises using to improve the ranking of one or more keywords / phrases in the title or description. 在另一个示例性实施方式中,策略130还可以包括进行竞价以实现向广告客户网页的更高牵引的一个或更多个搜索查询术语/关键词(例如,构成全部或部分搜索查询)。 In another exemplary embodiment, the policy 130 may also include a bid to achieve one or more search query terms to the advertiser's webpage higher draw / keyword (e.g., constituting all or part of the search query). [0030] 在另一个示例性实施方式中,建议机构128可以基于包括翻译模型116的模型数据106来生成针对包含翻译的词和/或短语的广告的元数据流132。 Metadata [0030] In another exemplary embodiment, the suggestion mechanism 128 may include an advertisement based translation model 116 to model data 106 is generated for the words comprising the translation and / or a stream 132 phrases. 例如,元数据流132可以包括着陆页信息(例如,URL或标题)、翻译的关键词、广告标题/描述和/或其他元数据。 For example, the metadata stream 132 may include the landing page information (e.g., URL or title), the translation of the keywords, advertising headline / described and / or other metadata. 搜索引擎提供方108可将元数据流132附于伴随广告的当前元数据。 Search engine provider metadata stream 108 may be attached to the current metadata 132 accompanying advertisement. 下面示出了元数据流132的示例性格式:[0031]广告客户着陆页URL/标题广告标题广告描述翻译的关键词[0032] 图2是示出了根据一个示例性实施方式的用于翻译模型训练的示例性流水线的框图。 The following illustrates an exemplary format of the metadata stream 132 is: [0031] Advertiser landing page URL / title headline keyword advertising described Translation [0032] FIG. 2 is a diagram illustrating an exemplary embodiment in accordance with one exemplary embodiment for translating block diagram of an exemplary pipeline model training. 示例性流水线的元素(例如,步骤或处理)可以开始于元素202,在元素202处,从包含搜索历史(例如,源于搜索查询的广告点击)的各个用户日志中提取查询-广告对。 Exemplary pipeline elements (for example, steps or processes) may begin to elements 202, 202 in the elements extracted from the query log contains each user search history (for example, from a search query clicks) in - advertising right. 要理解的是,尽管图2示出了用于文档和搜索查询翻译的元素,但是广告和搜索查询翻译也可以以相同或相似的方式来执行。 It is understood that, although Figure 2 shows the elements for documents and search query translation, but the translation advertising and search query can also be performed in the same or similar manner. 相应地,训练机构(如图1中的训练机构104)可以执行示例性流水线的元素中的至少一些。 Accordingly, training institutes (training mechanism 104 in FIG. 1) can perform at least some exemplary elements in the pipeline. [0033] 元素204是指训练词对齐模型和/或将词对齐模型应用于查询-文档对。 [0033] element 204 refers to the training word alignment model and / or the word alignment model to query - document right. 假定文档相关数据翻译成搜索查询,词对齐模型一般是指在给定文档相关数据的情况下的模型参数集合和搜索查询术语集合的联合似然性(joint likelihood)。 We assume that the document data into the translation search query, word alignment model typically refers to the joint model parameters given in the case of document-related data to a search query term set and a set of likelihood (joint likelihood). 模型参数集合可以包括来自文档相关数据(如文档标题)的词的排列U1. . . Bj),该排列映射到搜索查询术语位置(1... j)的索引。 Model parameter set may include a document from a word aligned U1 related data (e.g., document header)... Bj), this arrangement is mapped to the index search query term positions (1 ... j) a. 在文中可以被称为词对齐的这种排列可以表示为如下数值序列(numerical series):在该数值序列中,每个a」具有O和I (例如,诸如文档标题或关键词/标签等文档相关数据的长度)之间的值i,使得如果在搜索查询的位置j处的词被连接到在文档标题的位置i处的词,则afi,并且如果它没有连接到任何文档词,则aj=0。 This arrangement may be referred to in the text word alignment can be expressed as the following sequence of values ​​(numerical series): this value in sequence, each of a 'and an O I (e.g., such as a document title or keyword / labels and other documents i value between the length of the data), so that if the word is connected to a position j in the search query word position i in the document title, the AFI, and if it has no connection to the word document, the aj = 0. [0034] 词对齐模型可以基于文档词和搜索查询术语之间的依存关系。 [0034] word alignment model based on dependencies between words and document search query terms. 在一个实施方式中,词对齐模型可以假定词序列中的每个位置具有被分配给搜索查询中的相应词的均等概率,或者可以计算每个文档标题位置的条件概率。 In one embodiment, the word alignment model may be assumed that the position of each word sequence have an equal probability of being assigned to a corresponding word in the search query, or the conditional probability can be calculated for each position of the document title. 例如,文档标题中的第一个词相比于任何其他词位置可以具有映射到搜索查询术语的更高的概率。 For example, the title of the document in the first word position compared to any other word can have a higher probability map to the search query terms. 词对齐可以提供除了两个词/ 短语之间的同时出现计数之外的附加信息。 Word alignment can provide additional information beyond addition while the count between two word / phrase appears. 例如,使用词对齐估计的翻译概率可以考虑关于搜索查询中的映射到文档标题中的另一个词/短语的一个词/短语的位置的失真或一致性。 For example, using the estimated translation word alignment probability may be considered distorted or consistency location map on the search query of a word to another word document title / phrases / phrases. [0035] 训练机构可以采用用于生成词对齐的各种技术(例如,期望最大化及其变型)。 [0035] Training mechanism may be employed various techniques for generating a word alignment (e.g., it is desirable to maximize its variants). 这些技术中的一些技术(例如维特比(Viterbi )技术/算法)可以去除没有翻译成其他子语言的一些“隐藏”词和/或使得能够实现查询术语和文档标题词之间的一对一映射。 Some of these techniques techniques (eg Viterbi (Viterbi) techniques / algorithms) can be removed not translated into other languages ​​child some "hidden" words and / or enable one to one mapping between the query terms and word document title . 在一个示例性实施方式中,训练机构针对每个查询-广告双语词或短语(即双短语)计算最有可能的词序列,其中所述查询-广告双语词或短语是可以作为单位从一种子语言翻译成另一种子语言的连续词或短语。 In one exemplary embodiment, the training for each query mechanism - to calculate the most likely sequence of words or phrases ad bilingual word (i.e. double phrase), wherein the query - Ad bilingual word or phrase can be used as a seed from the unit language into another word or phrase continuous seed language. 这些词序列可以使训练机构能够集中于形成广告的提炼的关键词, 并假定搜索查询从这些关键词生成或翻译出。 The word sequence can make the training institutions to focus on the formation of refining keyword advertising, and assumes a search query generated or translated from these keywords. [0036] 元素206针对于词/短语对的提取。 [0036] 206 for elements in the word / phrase for extraction. 每个对(q,w)包括一个或更多个搜索查询术语(q)以及一个或更多个文档相关词(《),如广告标题或描述中的词。 Each pair (q, w) including one or more search query terms (q) and one or more documents related words ( "), as described in the ad title or word. 元素208是指基于词对齐来计算翻译概率P (q Iw)和翻译概率p(w|q)。 Refers to elements 208 is calculated based on word translation probability P (q Iw) and aligned with the translation probability p (w | q). 在一个示例性实施方式中,翻译概率p(q|w)表示特定术语q可以从给定的词w翻译出的条件概率(例如,似然性)。 In an exemplary embodiment embodiment, the translation probability p (q | w) represents the conditional probability (e.g., likelihood) q may be the specific terminology translated from a given word w. 在另一个示例性实施方式中,翻译概率P (w I q)表示特定词w可以从给定术语q翻译出的条件概率(例如,后验概率)。 In another exemplary embodiment, the translation probability P (w I q) represents a conditional probability (e.g., the posterior probability of) a particular word w can be translated from a given term q. [0037] 可以使用从用户日志导出的训练数据(例如,由KQi, DiLi=1-NI表示的查询-文档对)来获得词翻译概率P (q IW)。 [0037] using the training data derived from user log (for example, by the KQi, DiLi query represented = 1-NI - Documents on) to get word translation probability P (q IW). 训练方法可以遵循训练统计词对齐模型的标准过程。 Training methods can follow the standard procedure training statistics word alignment model. 在一个实施方式中,通过使从训练数据上的标题中生成查询的翻译概率最大化来对模型参数Θ进行优化:[0038] In one embodiment, the translation probability generated from the title on the training data by maximizing the query to optimize the model parameters Θ: [0038]

Figure CN103049474AD00081

[0039] P(Q|D, Θ)采用了作为如下等式的已知词对齐模型的形式,其中ε是常数,J是Q 的长度,I是文档相关数据D的长度:[0040] [0039] P (Q | D, Θ) as the following equation using the known form of word alignment model, wherein ε is a constant, J is the length Q, I is the length of the document data D: [0040]

Figure CN103049474AD00082

[0041] 为了找到最优词翻译概率,使用期望最大化(EM)算法,例如在训练数据上运行一定次数(例如,三次)的迭代,作为避免过拟合(over-fitting)的手段。 [0041] In order to find the optimal word translation probability, using the expectation-maximization (EM) algorithm, such as run on a certain number of training data (for example, three times) iteration means fitting (over-fitting) as to avoid over. 另一种替选方式是在短语级别分解P(QlA)并如文中所描述的那样训练基于短语的翻译模型。 Another alternative embodiment is an exploded P (QlA) and at the phrase level as training phrase-based translation model as described herein. [0042] 元素210是指将学习到的翻译概率存储到翻译模型集合中。 [0042] 210 refers to the storage element to learn to translate the translation probability model collection. 模型捕获在词、η元和短语的级别上搜索查询有多大可能映射到文档或文档有多大可能映射到搜索查询。 Model captures in words, η search queries How likely are mapped to the document or how likely a search query is mapped to the level of dollars and phrases. 令Q 表示搜索查询,而D表示文档的特定描述(例如,网页或广告着陆页的标题)。 Let Q represent the search query, and D represents a specific description of the document (eg, page or banner ads landing page). 如文中所述, 对于每个(Q,D)对,在输入Q的一个或更多个用户还点击D的情况下,可以假定D相对于Q是相关的。 As described herein, for each of the (Q, D) pair, when the input of one or more user further clicks Q D may be assumed with respect to D Q are related. 示例性翻译模型可以针对任何(Q,D)对提供翻译概率(如P(Q|D)和P(DlQ)), 或者具体地,可以针对A表示广告相关数据(例如,广告着陆页的标题)的任何(Q,A)对提供翻译概率(例如P(QlA)和P(A|Q)。可以使用各种技术分解并可靠地估计这些翻译概率。 作为示例,等式(3)中使用参数估计技术作为示例来示出如何计算P(QlD)以及训练翻译模型。[0043] 令Q=q1. . . qj为查询且令D=W1. . . W1为web文档或广告页(例如,着陆文档)的标题或描述。基于词的翻译模型假定Q和D两者为词袋,并且在给定D的情况下Q的翻译概率被计算为:[0044] Exemplary translation model can be for any (Q, D) for providing the translation probability |, may indicate advertisement-related data for the A (such as P (Q D) and P (DlQ)), or specifically (e.g., the ad's landing page title use | (Q a) may use various techniques to decompose and reliably estimate the translation probabilities of example, equation (3)) any (Q, a) to provide the translation probability (e.g. P (QlA) and P.. parameter estimation techniques as an example to illustrate how to calculate P (QlD) and training a translation model. [0043] order Q = q1... qj query and let D = W1... W1 of the web document, or advertising pages (e.g., landing document) of the title or description of the word translation model based on both the Q and D are assumed bag of words, and the translation probability Q is given in D is calculated as: [0044]

Figure CN103049474AD00083

[0045] 此处P (w ID)是词w在A中的一元概率,而P (q | w)为将w翻译成查询术语q的概率。 [0045] where P (w ID) is in A monohydric probability word w, and P (q | w) is the probability of the w to translate the query terms q. 通常,翻译模型允许通过向那些其它术语分配非零概率来将w翻译为其他语义相关的查询术语。 Typically, the translation model allows to translate w semantically related to other query terms assigned to those other terms non-zero probability. [0046] 转至排名文档,等式(3 )的基于词的翻译模型可以在其应用于文档排名之前被平滑。 [0046] Go ranking documents, Equation (3) based on translation of the word models may be smoothed before it is applied to the document ranking. 一种合适的平滑模型被定义为:[0051] A suitable smoothing model is defined as: [0051]

Figure CN103049474AD00091

[0053] C(q;C)和C(w;D)分别是q在(q, w)对的集合C中以及在文档中的计数,并且C 和Id分别是集合的大小和文档的大小。 [0053] C (q; C), and C (w; D) are q in (q, W) for a set C and counted in a document, and C, and Id are respectively the size of the size of the document collection . 在一个实施方式中,虽然搜索查询和文档可以由于基本语言是相同的而与不同的子语言相关联,但是,每个词/短语具有与自翻译相关联的一定概率(即,?&=«|«)>0)。 In one embodiment, although search queries and documents can be due to the base language is the same language but with different sub-associated, however, each word / phrase has a certain probability associated with the self-translation (ie,? & = << | «)> 0). 一方面,低的自翻译概率通过对匹配术语给予低的权重来降低检索性能。 On the one hand, low self-translation by giving a low probability of matching the right term to reduce the weight of retrieval performance. 另一方面,非常高的自概率不利用翻译模型的优点。 On the other hand, a very high probability not take advantage from the translation model. 根据一个实施方式,等式(5)被修改成等式(8),以明确地通过线性地混合基于翻译的估计和最大似然估计来调节自翻译概率:[0054] Ps(q|A) = α P (q | C) + (1-a ) Pmx (q | D),其中(8)[0055] According to one embodiment embodiment, the equation (5) are modified into equations (8), to clear the mixing based on the estimated by linearly translated maximum likelihood estimation and the self-adjusting the translation probability: [0054] Ps (q | A) = α P (q | C) + (1-a) Pmx (q | D), wherein (8) [0055]

Figure CN103049474AD00092

[0056] 在上述等式中,β e [O, I]是调节参数,表示调节自翻译概率的程度。 [0056] In the above equation, β e [O, I] Since the degree of adjustment is translation probability adjustment parameter, FIG. 在等式(9) 中设置β=1使翻译模型缩简为具有Jelinek-Mercer平滑的一元语言模型。 Β is provided in equation (9) = 1 so that the abbreviated translation model having a smooth Jelinek-Mercer-gram language model. 等式(9)中的P(q|D)为由等式(7)估计的未平滑的文档模型,使得对于g茫D,P (q ID)=0。 Equation P (9) of the (q | D) by Equation (7) unsmoothed the document model estimation, so that for the vast g D, P (q ID) = 0. [0057] 图3是示出了根据一个示例性实施方式的用于付费广告搜索的示例性运行时间数据流的框图。 [0057] FIG. 3 is a block diagram illustrating paid search advertising according to an exemplary embodiment of an exemplary runtime data stream. 在示例性运行时间数据流期间执行的处理开始于搜索查询解析和富集(enrichment)处理302。 Performed during processing in an exemplary data flow starts at runtime parsing a search query and enrichment (Enrichment) process 302. 如图所示,搜索查询被划分成术语集合Q= Iq1. .. qj并被富集成Q'。 As shown, the search query term is divided into a set of Q = Iq1. .. qj and rich integrated Q '. 例如,富集的搜索查询Q'可以包括附加/中间搜索术语和/或目标类别。 For example, the enriched search query Q 'may include additional / intermediate search terms and / or categories of targets. 富集的搜索查询被传送到广告选择处理304,广告选择处理304识别映射到一个或更多个目标类别和/ 或术语集合的一部分的广告集合。 Enrichment search query is transmitted to the advertisement selection process 304, advertisement selection process 304 identifies the one or more mapped to certain categories and / or a portion of the advertising term set collection. [0058] 在一个实施方式中,基于翻译模型,相关性过滤处理306可以将广告集合缩减成具有超过预定义阈值的翻译概率的相关广告的子集。 [0058] In one embodiment, the translation model based on the correlation filtering process 306 may be reduced to a subset of the set of ads relevant ads with translation probability exceeds a predefined threshold. 相关性过滤处理306可以应用基于词[0047] Correlation filtering process 306 may be applied based on word [0047]

Figure CN103049474AD00093

[0048] 此处,Ps(q,D)是背景一元模型和基于词的翻译模型的线性内插,其中ae [0,1] 是凭经验调节的内插权重:[0049] [0048] here, Ps (q, D) is a linear background unigram and translation of the word model based interpolation, wherein ae [0,1] is empirically adjusted interpolation weights: [0049]

Figure CN103049474AD00094

[0050] P(q,w)是可以使用等式(I)或等式(2)进行估计的基于词的翻译模型。 [0050] P (q, w) is used in equation (I) or Equation (2) based on translation of the word model estimation. P (Q | C) 和P(w|D)分别表示未平滑的背景和文档模型,并且在下面等式中使用最大似然估计进行估计:的翻译模型以将每个广告关键词独立地翻译成查询术语,或应用基于短语的翻译模型以使用词的序列来执行广告关键词到查询术语的翻译。 P (Q | C) and P (w | D) represent the background and the document unsmoothed models, and using a maximum likelihood estimation estimates the following equation: The translation model to translate independently of each advertisement keyword to query terms, or keyword advertising translation application to query terms phrase translation model to use the word sequence is executed based on. 在另一个实施方式中,相关性过滤处理306还可以使用其他特征来缩减相关广告的子集。 In another embodiment, the correlation filtering process 306 may also use other features related to the reduced subset of ads. 对于每个广告,这些特征的值可以被组合到用于相关广告细化(refinement)的相关性得分中。 For each advertisement, the values ​​of these features may be combined into a relevance score for the associated advertisement refining (refinement) of. [0059] 在一个实施方式中,点进率预测处理308还可以使用翻译模型来计算关于用户有多大可能选择/点击特定相关广告的概率/值。 [0059] In one embodiment, the click through rate prediction process 308 may also use the translation model to compute the user may choose how the probability / clicks on a specific relevant ads / value. 基于相关性得分和点进预测率,排名和分配处理310将相关广告的子集进行排名,并产生包含按照排名的顺序的相关广告的子集的搜索结果页。 Based on the correlation score and the feed point prediction rate, the ranking allocation process 310, and a subset of relevant ads rank, and generating search results page contains a subset of ads according to the order of ranking. [0060] 图4是示出了根据一个示例性实施方式的部署用于将广告映射到搜索查询的基于短语的翻译模型的示例性步骤的流程图。 [0060] FIG 4 is a flowchart illustrating deployment of an exemplary embodiment for mapping the exemplary steps ad to the search query based on a phrase translation model. 示例性步骤可以开始于步骤402,并进行到步骤404,在步骤404处,产生了词对齐训练语料库。 The exemplary steps may begin at step 402, and proceeds to step 404, at step 404, to generate a word alignment training corpus. 搜索引擎指示训练机构在搜索查询的每个词与广告相关数据(诸如描述或标题)中的相应词之间产生词对齐。 Search engine indication training institutions in each word and advertising related search query data to generate word alignment between the corresponding word (such as a description or title) in. 在一个实施方式中,词对齐可以指彼此之间进行翻译的不同子语言中的词之间的映射。 In one embodiment, the alignment word can be a mapping between the different sub-language translation between words in each other. 训练机构可以使用预定的词对齐模型,或者可以使用记录的搜索引擎使用数据(例如,点进数据)来训练词对齐模型。 Training mechanism may use a predetermined word alignment model, or may use a search engine using the data (e.g., click-through data) recorded to train word alignment model. [0061] 在另一个实施方式中,词对齐可以表示:针对搜索查询(Q)中的每个连续短语的、 广告标题(A)中的该连续短语所源于的相应短语,并且反之亦然。 [0061] In another embodiment, the alignment word can be expressed: for search query (Q) for each successive phrase, headline (A) is derived from the respective continuous phrase phrase, and vice versa . 首先,训练机构在两个方向上使用关于查询-广告(标题)对的词对齐模型的期望最大化训练来学习两个基于词的翻译模型:从搜索查询到广告标题的第一基于词的翻译模型以及从广告标题到搜索查询的第二基于词的翻译模型。 First, the use of training institutions in both directions on the query - based translation of the word model ad (headline) expectation maximization word alignment model for training to learn two: from the search queries to advertising based on the title of the first translation of the word model and translation model based on the second word queries from the headline to search. 基于每个搜索查询与每个广告标题之间的词对齐模型(例如,“隐藏” 词对齐),训练机构确定维特比词对齐,V*=Vl. . . vj,其中,查询术语位置j根据如下等式(10) Word alignment model between each ad per search query and title-based (for example, "hidden" word alignment), training institutions to determine the Viterbi word alignment, V * = Vl... Vj, which, according to the query term position j the following equation (10)

Figure CN103049474AD00101

[0065] 维特比词对齐一般是指P (Q,Vl A)为最大的词对齐。 [0065] Viterbi alignment word generally refers to P (Q, Vl A) is the maximum word alignment. 为了计算维特比词对齐,针对每个j,训练机构选择使词翻译概率P (i?i 尽可能大的Vj。在一个实施方式中,两个维特比词对齐是通过以下方法组合的:从这两个维特比词对齐的交集开始,根据一组已知的启发式规则逐渐包括更多个对齐映射或连接。[0066] 步骤406针对提取双短语以及估计短语翻译概率。在一个示例性实施方式中,双短语包括与组合的词对齐相符合并使用该组已知的启发式规则选择的双语短语。例如,训练机构可以建立最大的短语长度。[0067] 如文中所述,基于短语的翻译模型可以是将广告相关数据(A)翻译成搜索查询(Q)的产生式模型(generative model)。替代如在基于词的翻译模型中隔离地翻译单个词, 短语模型将A中的词序列(即短语)翻译成Q中的词序列,从而合并上下文信息。例如,可以学习到短语“stuffynose (鼻子不通)”可以以相对高的 In order to calculate the Viterbi word alignment for each j, training institutions select the word translation probability P (i i Vj as large as possible in one embodiment, two Viterbi alignment word is a combination of the following ways:? From the two Viterbi alignment words beginning intersection, according to a known set of heuristics comprises progressively more alignment or connection mapping. [0066] step 406 for extracting phrases and bis phrase translation probability estimation. in one exemplary embodiment embodiment, the phrase includes double word alignment of consistent composition and using the set of heuristic rules known in the bilingual phrase selection. For example, the training means may establish the maximum phrase length. [0067] as described herein, the phrase-based translation model may be the advertisement-related data (a) translated into a search query (Q) of the production model (generative model). Alternatively as translation model based on word isolation translated a single word, phrase model word sequences a of ( i.e. phrase) translated into the sequence of words Q, thereby merging the context information, for example, can learn the phrase "stuffynose (stuffy nose)" may be relatively high 率从“cold (感冒)”翻译出,即使至(12)在每个方向上映射至广告标题(A)的词单个词对(即“stuffy (不通)”/ “cold (感冒)”和“nose (鼻子)”/ “cold (感冒)”)都不会具有高的词翻译概率。[0068] 在一个实施方式中,广告着陆页描述(A)被分成K个非空词序列W1, ...,wk,然后每个非空词序列被翻译成新的非空词序列q1;. . ·,qk,并且这些短语被置换并被联结以形成查询Q。变量w和q表示连续的词序列。[0069] 表I示出了示例性搜索查询Q的产生式处理:[0070] Increased from "Cold (cold)," translated, even to (12) mapped to a headline (A) is a single word of the words in each direction (i.e., "Stuffy (barrier)" / "cold (cold)" and " nose (nose) "/" cold (common cold) ") will not have a high probability word translation. [0068] in one embodiment, the ad's landing page description (a) is divided into K non-null word sequence W1 of,. .., wk, and each non-empty sequence of words is translated into a new non-empty word sequence q1 ;.. ·, qk, and these phrases and substitutions are coupled to form a query Q. variables w and q represents successive words sequence [0069] table I shows an exemplary production process of a search query Q: [0070]

Figure CN103049474AD00111

[0071]表 I[0072] 令S表示将A分割成K个短语W1, . . .,Wk,令T表示K个翻译短语q1; . . .,qK,其中(Wi, qi)对被称为双短语。 [0071] TABLE I [0072] Let S represent dividing A into K phrase W1,, Wk, let T represent the K translated phrases q1;......, QK, wherein (Wi, qi) to be referred to double phrase. 令M表示代表最后的重新排序步骤的K个元素的置换。 Let M represents a replacement K elements represents the last step of re-ordering. 令B(A,Q) 表示将A翻译成Q的S、T、M三元组的集合。 Order B (A, Q) represents the translated into the set Q A S, T, M triples. 如果假定在分段上为均匀概率分布,则基于短语的翻译概率可以被定义为:[0073] If a uniform probability distribution is assumed on the segment, then the phrase-based translation probability can be defined as: [0073]

Figure CN103049474AD00112

[0074] 在对该和应用最大近似之后,产生下面的等式:[0075] [0074] After the application and the maximum approximation yields the following equation: [0075]

Figure CN103049474AD00113

[0076] 在给定维特比词对齐V *的情况下,当根据词对齐训练语料库对给定的查询-广告对记分时,或在通过搜索引擎进行部署期间,训练机构采用与V *—致的S、T、M三元组,它们被表示为B (C,Q,V*)。 [0076] The case where the aligned V given Viterbi words * when training corpus given query based on word alignment - ads on when scoring or during deployment through a search engine, the training means employed and V * - induced the S, T, M triples, they are represented as B (C, Q, V *). 在一个实施方式中,一致性意味着如果两个词在V *中对齐,则这些词要出现在相同的双短语(Wyqi)中。 In one embodiment, consistency means that if two words are aligned in V *, then these words should appear in the same two-phrase (Wyqi) in. 一旦词对齐是固定的,最终的置换是唯一确定的,使得可以丢弃该因子,从而将等式(14)重写为:[0077] Once the alignment word is fixed, the final displacement is uniquely determined, so that this factor may be discarded so that the equation (14) can be rewritten as: [0077]

Figure CN103049474AD00114

[0078] 对于剩余的因子P(T|A,S),假定分段的查询T=q1. . . qk从左到右通过如以下等式中所描述的那样独立地翻译每个短语(W1. ..wK)而生成,其中P(qK|wK)为短语翻译概率:[0079] [0078] For the remaining factor P... (T | A, S), assumed segment T = q1 qk query from left to right through as independently of each phrase translation (W1 as described in the following equation . ..wK) is generated, where P (qK | wK) for the phrase translation probability: [0079]

Figure CN103049474AD00121

[0080] 由等式(10)至(16)定义的基于短语的查询翻译概率P(QlA)可以有效地使用动态编程方法来计算。 [0080] by equation (10) to (16) is calculated based on the definition of a query phrase translation probability P (QlA) can effectively use the dynamic programming method. 令量为覆盖头j个查询术语的查询短语的序列的总概率。 So that the total amount covered by the probability of the sequence of the first term of the j-th query query phrase. P(Q|A) 可以使用下面的递归计算:[0081]初始化:aQ (17) [0082]归纳 P (Q | A) can be calculated recursively using the following: [0081] Initialization: aQ (17) [0082] induction

Figure CN103049474AD00122

[0083]总式:P(Q|A) = a 八19)[0084] 在给定所收集的双语短语的情况下,使用相对计数来估计短语翻译概率P (q I wq), 其中N(w,q)是在训练数据中w被对齐到q的次数: [0085] [0083] Total the formula: P (Q | A) = a 8:19) [0,084] in the case of a given collected bilingual phrase using a relative count to estimate the phrase translation probability P (q I wq), where N ( w, q) w in the training data is being aligned to the number of times q: [0085]

Figure CN103049474AD00123

[0086] 作为等式(20)的替选等式,训练机构可以将被称为词汇权重的量估计为短语翻译概率的平滑版本。 [0086] As the equation (20) alternative equation, training institutions can be called heavy vocabulary right amount estimated for the phrase translation probability of a smooth version. 令P(q|w)为文中针对基于词的翻译模型(例如,等式(I)至(9))所描述的词翻译概率,以及令V为查询术语位置i=l... q|与标题词位置j=l... |w|之间的词对齐(例如,“隐藏”词对齐),则由Pw(q|w,V)表示的词汇权重可以使用下面的等式计算:[0087] So P (q | w) is the text for the word-based translation model (e.g., Equation (I) to (9)) described word translation probabilities, and V is a query term so that the position i = l ... q | the entry word positions j = l ... | w | alignment between words (e.g., "hidden" word aligned), by Pw (q | w, V) lexical weights can be represented by a weight calculated using the following equation: [0087]

Figure CN103049474AD00124

[0088] 步骤408针对使用搜索引擎来部署基于短语的翻译模型。 [0088] Step 408 for using a search engine based deployment phrase translation model. 在一个实施方式中,搜索引擎可以包括信息检索系统,当对文档/广告进行排名时,该信息检索系统使用基于短语的翻译模型作为特征信息的源,或者替选地,搜索引擎可以响应于搜索查询而采用翻译模型来直接对广告进行排名。 In one embodiment, the search engine may include information retrieval system, when the document / advertising ranking, the information retrieval system using the translation model phrase-based as the source of the feature information, or alternatively, the search engine in response to a search queries to direct ads ranked using a translation model. 在一个替选的实施方式中,基于短语的翻译模型的集合可以用于计算各种特征值,包括示例性特征P(AlQ)和P(QlA),其中P(AlQ)和P(Q|A)是指从搜索查询翻译广告标题以及从广告标题翻译搜索查询。 In an alternative embodiment, based on a set of phrase translation model can be used to calculate the value of various features, including the exemplary features P (AlQ), and P (QlA), where P (AlQ), and P (Q | A ) refers to the search query translation from the ad title and translate search queries from the headline. [0089] 一些信息检索系统的实施例利用线性排名模型框架,在该框架中,除了一个或更多个翻译模型之外的不同模型可以作为特征被结合。 [0089] Some embodiments of the information retrieval system using a linear ranking mold frame, in the frame, in addition to one or more of the translation model different models can be incorporated as a feature. 线性排名模型采用M个特征的集合的形式,即fm,其中m=l. · · M0每个特征是将(Q,A)映射到真实值的任意函数,€ K。 Rank linear model with M form a set of features, i.e., FM, where m = l. · · M0 wherein each of the (Q, A) is mapped to the real value of any function, K. € 该模型具有M个参数,即λ m,其中m=l. . . M,其中每个参数与一个特征函数相关联。 The model has parameters M, i.e., λ m, wherein m = l... M is associated, wherein a characteristic parameter for each function. [0090] 步骤410针对对搜索查询进行处理以及产生包含相关广告的搜索结果。 [0090] Step 410 for processing search queries and produce search results that contain relevant ads. 与搜索查询Q相关联的广告A的相关性得分被计算为:[0091] Associated with the search query Q ad A relevance score is calculated as: [0091]

Figure CN103049474AD00131

[0092] 根据各个实施方式,除了其他已知特征外或替代其他已知特征,可以使用下面的基于翻译模型的特征的任意组合。 [0092] According to various embodiments, other known features in addition or alternatively other known features may be used based on any combination of the features of the following translation model. 作为示例,搜索引擎可以利用等于IogP(QlA)的短语翻译特征fPT(Q,A,V),其中,P(Q|A)通过等式(17)至(19)来计算,短语翻译概率P(q|W(1)使用等式(20)来估计。作为另一个示例,搜索引擎可以利用等于IogP(QlA)的词汇权重特征f„(Q,A,V),其中,P(Q|A)使用等式(17)至(19)来计算,短语翻译概率P(q|W(1)使用等式(20)来估计。[0093] 此外,搜索引擎可以利用等于 As an example, using a search engine may be equal IogP (QlA) phrase translation feature fPT (Q, A, V), where, P (Q | A) through (19) is calculated by equation (17), the phrase translation probability P (q | W (1) using equation (20) is estimated as another example, the search engine may use equal IogP (QlA) lexical weight wherein f "(Q, a, V), where, P (Q. | A) using equation (17) to (19) is calculated, the phrase translation probability P (q | W (1) using equation (20) to estimate [0093] Also, the search engine may utilize equal.

Figure CN103049474AD00132

的短语对齐特征fPA(Q,A,B), 其中,B是K个双语短语的集合,ak是被翻译成第k个查询短语的标题短语的开始位置, 而Iv1是被翻译成第(k-Ι)个查询短语的标题短语的终点位置。 The phrase alignment features fPA (Q, A, B), wherein, B is K sets bilingual phrase, ak is the start position is translated into a k-th query phrase title phrases and Iv1 is translated into the (k -Ι) end position title phrases query phrase. 特征对查询短语被重新排序的程度进行建模。 Characteristics of the extent of the query phrase is reordered modeled. 对于所有可能的B,搜索引擎仅根据维特比对齐B来计算特征值, B*=argmaxBP(Q,B|A)0除了等式(18)中的求和算子被取最大算子代替之外,B*可以使用类似于等式(17)至(19)的动态编程递归的技术来计算。 For all possible B, the search engine only feature value calculated in accordance with the Viterbi alignment B, B * = argmaxBP (Q, B | A) 0 In addition to equation (18) in taking the maximum summation operator is replaced by the operator outer, B * can be used similar to equations (17) to (19) of a recursive dynamic programming technique to calculate. [0094] 搜索引擎还可以利用未对齐词的罚特征f胃(Q,A, V),其被定义为未对齐的查询术语的数目与查询术语的总数之间的比。 [0094] The search engine may also be used wherein fine gastric unaligned word f (Q, A, V), which is defined as the ratio between the total number of query terms the number of query terms misaligned. 搜索引擎还可以利用等于log P(QlA)的语言模型特征fui(Q,A),其中,P(QlA)是具有Jelinek-Mercer平滑的一元模型(即,由等式(4)至(9) 定义的,其中β=1)。 Search engines can also use equal log language model feature P (QlA) of fui (Q, A), where, P (QlA) monohydric model Jelinek-Mercer smooth (i.e., by Equation (4) to (9) defined, where β = 1). 搜索引擎也可以利用等于IogP(QlA)的词翻译特征fWT(Q,Α),其中, P(QlA)为由等式(3)定义的词翻译模型,其中使用等式(I)的期望最大化训练来估计词翻译概率。 Search engines can also use the equal IogP (QlA) characteristic of word translation fWT (Q, Α), where, P (QlA) by Equation (3) defines a word translation model, using the equation (I) is the maximum desired training to estimate word translation probabilities. [0095] 在针对每个相关联的广告计算相关性得分之后,步骤410还针对根据相关性得分对相关联的广告进行排名。 [0095] After calculating the relevance score for each associated advertising, it is also directed to step 410 ranks according to advertisement associated relevance scores. 这样的排名生成以排名的顺序列出相关联的广告的搜索结果。 This ranking is generated in the order listed in the ranking of search results associated with advertising. 相关联的广告中的一些会由于无法实现最小相关性得分而被去除。 Advertising associated with some will not be achieved due to the minimum correlation score is removed. [0096] 步骤412针对为与排名后的广告相关联的一个或更多个广告用户产生策略。 [0096] Step 412 generates strategy for a post-advertising associated with the ranking or more advertisers. 搜索引擎的建议机构可以使用翻译模型来产生候选关键词,以改进关键词竞标。 Search engine advice organizations can use translation model to generate keyword candidates to improve keyword bidding. 建议机构也可以根据用于改进广告网页或着陆页的一些预选择的关键词来产生候选广告描述。 Recommends that agencies can also be described in terms of advertising generate candidate for improving the advertising pages or landing pages of pre-selected number of key words. 建议机构也可以基于点击预测似然性(即,点进率)来产生表示改进的分配预算的信息。 It recommends that agencies can also click on the predicted likelihood (ie, click-through rate) to produce information showing a modification of budget allocations. 步骤414确定是否要处理下一个搜索查询。 At step 414 to determine whether to process a search query. 如果不存在更多的搜索查询,则步骤414进行到步骤416。 If there are no more search queries, step 414 to step 416. 如果存在更多的搜索查询,则步骤414返回至步骤410。 If there are more search query, then step 414 returns to step 410. 步骤416结束示例性步骤。 Step 416 the exemplary steps end. [0097] 示例性的网络化和分布式环境[0098] 本领域的普通技术人员可以理解,文中描述的各种实施例和方法可以结合任何计算机或其他客户端或服务器设备进行实施,所述计算机或其他客户端或服务器设备能够部署成计算机网络的一部分或部署到分布式计算环境中,并且可以连接至任何类型的数据存储区。 [0097] Exemplary Networked and Distributed Environment [0098] Those of ordinary skill in the art will appreciate that various embodiments and methods described herein may be combined with any computer or other client or server device embodiment, the computer or other client or server device can be deployed as part of a computer network, or to deploy a distributed computing environment, and may be connected to any type of data storage area. 在这方面,文中描述的各种实施例可以实施在任何计算机系统或环境中,所述计算机系统或环境具有任意数量的存储器或存储单元、以及在任意数量的存储单元上发生的任意数量的应用和处理。 In this regard, the various embodiments described herein may be implemented in any computer system or environment, the computer system or environment having any number of memory or storage units, and any number of occurring across any number of storage units applied and treatment. 这包括但不限于下述环境:其中服务器计算机和客户端计算机部署在网络环境或分布式计算环境中,具有远程或本地存储器。 This includes but is not limited to the following environment: wherein the server computer and client computers deployed in a network environment or a distributed computing environment, having remote or local storage. [0099] 分布式计算提供通过计算设备和系统之间的通信交换来共享计算机资源和服务。 [0099] Distributed computing provides by communicative exchange among computing devices and systems for sharing of computer resources and services. 这些资源和服务包括用于对象(如文件)的信息交换、缓存存储以及盘存储。 These resources and services include an object (such as a file) the exchange of information, cache storage and disk storage. 这些资源和服务还包括在用于负载平衡、资源扩张、处理专业化等的多个处理单元上共享处理能力。 These resources and services also include the sharing of processing power across multiple processing units for load balancing, expansion of resources, specialization of processing. 分布式计算利用网络的连接性,允许客户端利用他们的集体能力来使整个企业受益。 Distributed computing takes advantage of network connectivity, allowing clients to use their collective power to benefit the entire enterprise. 在这方面, 各种设备可以具有以下应用、对象或资源:其可以参与到如针对本主题公开内容的各种实施例而描述的资源管理机构中。 In this regard, the various devices may have applications, objects or resources: it can participate in resource management mechanism as for various embodiments of the present disclosure and subject matter described. [0100]图5提供了示例性的网络化或分布式计算环境的示意图。 [0100] Figure 5 provides a schematic diagram of an exemplary networked or distributed computing environment. 分布式计算环境包括计算对象510,512等以及计算对象或设备520、522、524、526、528等,计算对象510,512等以及计算对象或设备520、522、524、526、528等可以包含由示例性应用530、532、534、536、538 表示的程序、方法、数据存储区、可编程逻辑等。 Distributed computing environment comprises computing objects 510, 512, etc., and other computing objects or devices 520,522,524,526,528, computing objects 510, 512 etc. and computing objects or devices 520,522,524,526,528 like the exemplary applications may include 530,532,534,536,538 program representation, methods, data stores, programmable logic, etc. 可以理解,计算对象510、512等以及计算对象或设备520、522、524、526、528等可以包括不同的设备,如个人数字助理(PDA)、音频/视频设备、移动电话、MP3播放器、个人计算机、膝上型电脑等。 It will be appreciated, 510, 512 and other computing objects and computing objects or devices 520,522,524,526,528, etc. may comprise different devices, such as a personal digital assistant (PDA), audio / video devices, mobile phones, MP3 players, personal computer, laptop and so on. [0101] 每个计算对象510、512等以及计算对象或设备520、522、524、526、528等可以通过通信网络540与一个或更多个其他的计算对象510、512等以及计算对象或设备520、522、 524、526、528等直接地或间接地进行通信。 [0101] Each computing objects 510, 512, etc. and computing objects or devices 520,522,524,526,528 like via a communication network 540 with one or more other computing objects 510, 512, etc. and computing objects or devices 520,522, 524,526,528 and other direct or indirect communication. 即使在图5中被示出为单个元件,通信网络540 也可以包括向图5中的系统提供服务的其他计算对象和计算设备,和/或可以代表多个互连网络(未示出)。 Even if they are shown in FIG. 5 as a single element, communications network 540 may also comprise other computing objects and computing devices that provide services to the system of FIG. 5, and / or may represent multiple interconnected networks (not shown). 每个计算对象510、512等或者计算对象或设备520、522、524、526、528等还可以包含下述应用(如应用530、532、534、536、538):其会利用适于与根据本主题公开内容的各种实施例提供的应用进行通信或对所提供的应用进行实施的、API或其他对象、软件、固件和/或硬件。 Each computing object 510, 512 or other computing objects or devices 520,522,524,526,528 like may also contain the following application (e.g., application 530,532,534,536,538): which will be adapted according to the use of or communication, the API, or other object, software, firmware and / or hardware application provided by embodiments of the present application provides various embodiments of the subject disclosure. [0102] 存在支持分布式计算环境的各种系统、组件和网络配置。 [0102] the presence of the various systems that support distributed computing environments, components, and network configurations. 例如,计算系统可以经由本地网络或广泛分布的网络、通过有线或无线系统连接在一起。 For example, computing systems may be connected together via local networks or widely distributed network by wired or wireless systems. 目前,许多网络耦接到提供用于广泛分布计算的基础结构并涵盖许多不同的网络的互联网,但是,任何网络基础结构可以用于例如各种实施例中所描述的系统易于发生的通信。 Currently, many networks are coupled to provide infrastructure for widely distributed computing and encompasses many different networks of the Internet, but any network infrastructure can be used for various systems described in the embodiment of the communication occurs easily. [0103] 因此,可以利用网络拓扑和网络基础结构(如客户端/服务器架构、对等架构或混合式架构)的主机。 [0103] Thus, using the network topology and network infrastructure (such as a client / server architecture, or hybrid architectures peer architecture) host. “客户端”是使用与其无关的另一类或组的服务的类或组的成员。 Members of a class or group of services of another class or group of "client" is for unrelated. 客户端可以是请求由另一个程序或进程提供的服务的进程,例如,粗略地说为指令或任务的集合。 The client can request the service's process provided by another program or process, for example, roughly speaking as a set of instructions or tasks. 客户端进程使用所请求的服务,而无需“知道”关于其他程序或其自身的服务的任何工作细节。 The client process utilizes the requested service without having to "know" any working details about the other program or its own services. [0104] 在客户端/服务器架构中,特别是在网络化系统中,客户端通常是访问由另一台计算机(例如,服务器)提供的共享网络资源的计算机。 [0104] In a client / server architecture, particularly a networked system, a client is usually provided by access to another computer (e.g., server) computer shared network resources. 在图5的图示中,作为非限制性示例,计算对象或设备520、522、524、526、528等可以被认为是客户端,而计算对象510、512等可以被认为是服务器,其中,用作服务器的计算对象510、512等提供数据服务,例如从客户端计算对象或设备520、522、524、526、528等接收数据,数据的存储、数据的处理、将数据发送到客户端计算对象或设备520、522、524、526、528等,但是任何计算机可以根据情况而被视为是客户端、服务器、或客户端和服务器两者。 In the illustration of FIG. 5, as non-limiting example, computing objects or devices 520,522,524,526,528, etc. can be thought of as clients, and computing objects 510, 512, etc. can be thought of as servers, wherein computing objects 510, 512 and the like is used as a server providing data services, such as processing or computing objects and the like from the client device 520,522,524,526,528 receive data, data storage, data, and transmits the data to the client computing 520,522,524,526,528 and other objects or equipment, but under the circumstances and any computer can be considered both a client server, or client and server. [0105] 服务器通常是可以通过远程或本地网络(例如互联网或无线网络基础结构)访问的远程计算机系统。 [0105] A server is typically a remote computer system accessible over a remote or local network (e.g. the Internet or wireless network infrastructure). 客户端进程可以在第一计算机系统中是活动的,而服务器进程可以在第二计算机系统中是活动的,客户端进程和服务器进程彼此通过通信介质进行通信,从而提供分布式功能以及允许多个客户端利用服务器的信息收集能力。 The client process may be active in a first computer system, and the server process may be active in a second computer system, the client process and server process communicate with each other over a communications medium, thus providing distributed functionality and allowing multiple client use of information-gathering capabilities of the server. [0106] 在通信网络540或总线为互联网的网络环境中,例如,计算对象510、512等可以是Web服务器,其他计算对象或设备520、522、524、526、528等经由多个已知协议(如超文本传输协议(HTTP))中的任何协议与Web服务器通信。 [0106] In a communication network or bus 540 to the Internet network environment, for example, computing objects 510, 512, etc. can be Web servers, other computing objects or devices via a number of known protocols like 520,522,524,526,528 (e.g., hypertext transfer protocol (the HTTP)) any protocol to communicate with the Web server. 用作服务器的计算对象510、512等也可以用作客户端(例如计算对象或设备520、522、524、526、528等),这可以是分布式计算环境的特征。 Computing objects 510, 512 and the like may also be used as a server as a client (e.g., computing objects or devices 520,522,524,526,528, etc.), as may be characteristic of a distributed computing environment. [0107] 示例性计算设备[0108] 如所提到的,有利的是,文中所描述的技术可以应用于任何设备。 [0107] Exemplary Computing Device [0108] As mentioned, advantageously, the techniques herein described may be applied to any device. 因此,可以理解的是,设想各种类型的手持式、便携式和其他计算设备和计算对象结合各个实施例进行使用。 Thus, it is understood that various types contemplated that handheld, portable and other computing devices and computing objects used in conjunction with various embodiments. 相应地,下面图6中描述的以下通用远程计算机只是计算设备的一个示例。 Accordingly, the below general purpose remote computer described below in FIG. 6 she is only one example of a computing device. [0109] 实施例可以部分地通过操作系统来实施以由设备或对象的服务的开发者使用,和/或被包括在进行操作以执行文中所描述的各种实施例的一个或更多个功能方面的应用软件中。 [0109] Example embodiments can partly be implemented via an operating system service for use by a developer of a device or object, and / or performing operations comprising performing a variety of embodiments described herein or more functions of the embodiments application software aspects. 软件可以以由一个或更多个计算机(如客户端工作站、服务器或其他设备)执行的计算机可执行指令(如程序模块)的通用上下文进行描述。 The general context of software executable instructions (e.g., program modules) executed by one computer or more computers (e.g., client workstations, servers or other devices) will be described. 本领域的普通技术人员将会理解, 计算机系统具有可用于传送数据的各种配置和协议,并且因此具体的配置或协议不应被视为限制性的。 Those skilled in the art will appreciate that computer systems have a variety of configurations and protocols may be used to transmit data, and thus the particular configuration or protocol should be considered limiting. [0110] 因此图6示出了合适的计算系统环境600的示例,其中,可以实施文中所描述的实施例的一个或更多个方面,尽管上面已清楚的,但是计算系统环境600只是合适的计算环境的一个示例,并且不意在提出关于使用或功能的范围的任何限制。 [0110] Thus Figure 6 shows an example of a suitable computing system environment 600, which may implement one or more aspects of the embodiments herein described, while the above has been clear, but the computing system environment 600 is only suitable example of a computing environment, and are not meant to suggest any limitation as to scope of use or functionality. 此外,计算系统环境600不意在被解释为具有与示例性计算系统环境600中示出的组件的中的任何一个或组合有关的任何依存性。 Moreover, the computing system environment 600 is not intended to be interpreted as having any one or combination of components illustrated exemplary computing system environment 600 in related in any dependency. [0111] 参照图6,用于实施一个或更多个实施例的示例性远程设备包括计算机610形式的通用计算设备。 [0111] Referring to FIG. 6, for implementing one or more exemplary embodiments include a remote device in the form of a computer 610 of a general purpose computing device. 计算机610的组件可以包括但不限于:处理单元620、系统存储器630和系统总线622,系统总线622将包括系统存储器的各种系统组件耦接至处理单元620。 The computer assembly 610 may include, but are not limited to: a processing unit 620, a system memory 630 and a system bus 622, system bus 622 to various system components including the system memory to the processing unit 620 is coupled. [0112] 计算机610通常包括各种计算机可读介质,所述计算机可读介质可以是能够被计算机60访问的任何可用介质。 [0112] Computer 610 typically includes a variety of computer readable media, the computer-readable media can be any available media that can be accessed by the computer 60. 系统存储器630可以包括易失性和/或非易失性存储器形式的计算机存储介质,如只读存储器(ROM)和/或随机存取存储器(RAM)。 The system memory 630 may include volatile and / or nonvolatile computer storage media in the form of memory such as read only memory (ROM) and / or random access memory (RAM). 通过示例而不是限制,系统存储器630还可以包括操作系统、应用程序、其他程序模块和程序数据。 By way of example and not limitation, system memory 630 may also include an operating system, application programs, other program modules, and program data. [0113] 用户可以通过输入设备640将命令和信息输入到计算机610中。 [0113] user may enter commands and information into the computer 610 through input device 640. 监视器或其他类型的显示设备也通过接口(如输出接口650)连接至系统总线622。 A monitor or other type of display device is also connected to the system bus 622 via an interface (such as output interface 650). 除了监视器之外,计算机还可以包括可以通过输出接口650进行连接的其他外围输出设备,如扬声器和打印机。 In addition to the monitor, computers may also include other peripheral output devices may be attached, such as speakers and a printer 650 via an output interface. [0114] 计算机610可以使用到一个或更多台其他远程计算机(如远程计算机670)的逻辑连接而在网络化或分布式环境中运行。 [0114] Computer 610 may be used to stage one or more other remote computers (e.g., remote computer 670) logical connections operate in a networked or distributed environment. 远程计算机670可以是个人计算机、服务器、路由器、网络PC、对等设备或其他常见网络节点、或任何其他远程介质消耗或传输设备,并且可以包括上面相对于计算机610描述的任何或全部元素。 The remote computer 670 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, or any other remote media consumption or transmission device, and may include any or all of the above with respect to elements of the computer 610 described above. 图6中所描绘的逻辑连接包括网络672,例如局域网(LAN)或广域网(WAN),但也可以包括其他网络/总线。 FIG 6 The logical connections depicted include a network 672, such as a local area network (LAN) or a wide area network (WAN), but may also include other networks / buses. 这样的网络环境在住宅、办公室、企业范围的计算机网络、内联网和互联网中是普通的。 Such networking environments in homes, offices, enterprise- wide computer networks, intranets and the Internet are common. [0115] 如上面提到的,尽管已经结合各种计算设备和网络架构对示例性实施例进行了描述,但是基本构思可以应用于期望提高资源利用效率的、任何网络系统和任何计算设备或系统。 [0115] As mentioned above, in connection with various computing device despite network architecture and the exemplary embodiments have been described, but the basic concepts may be applied to improve resource use efficiency is desirable, to any network system and any computing device or system . [0116] 此外,存在多种方式来实施相同或相似的功能,例如使得应用和服务能够利用文中所提供的技术的合适的AP1、工具包、驱动程序代码、操作系统、控制、独立或可下载的软件对象等。 [0116] In addition, there are multiple ways of implementing the same or similar function, for example such applications and services to use technology herein provides a suitable AP1, tool kit, driver code, operating system, control, standalone or downloadable the software objects. 因此,从API (或其他软件对象)的角度以及从实施文中所述的一个或更多个实施例的软件或硬件对象来设想文中的实施例。 Thus, from an API (or other software object), and an angle from the embodiments described herein or a more software or hardware object that contemplated embodiments to the embodiments described herein. 因此,文中所描述的各种实施例可以具有全部硬件、部分硬件、部分软件以及软件的方面。 Accordingly, the various embodiments described herein may have all the embodiments in hardware, partly in hardware terms, software, and the software portion. [0117] 词“示例性”在文中用于表示作为示例、实例或说明。 [0117] The word "exemplary" is used herein represented as an example, instance, or illustration. 为了避免疑问,文中所公开的主题并不限于这样的示例。 For the avoidance of doubt, the subject matter disclosed herein is not limited to such an example. 此外,文中描述为“示例性”的任何方面或设计既不一定被解释为比其他方面或设计优选或有利,也不意味着排除本领域普通技术人员已知的等同的示例性结构和技术。 Further, described herein as "exemplary" Any aspect or design is not necessarily to be construed as preferred or advantageous over other aspects or designs, nor does it mean to exclude those of ordinary skill in the art equivalent exemplary structures and techniques. 此外,对于使用术语“包括”、“具有”、“包含”和其他类似词的程度,为避免疑问,以与作为开放性承接词的术语“包括”相同的方式,这样的术语意在为包含性的(inclusive),在权利要求中使用时不排除任何额外的或其他元素。 In addition, the terms "comprising", 'with', the degree of "including," and other similar words, to the avoidance of doubt, to the term as an open receiving word "comprising" in the same manner, such terms are intended to be inclusive property (inclusive), does not exclude any additional or other elements when used in a claim. [0118] 如所提到的,文中描述的各种技术可以结合硬件或软件进行实施,或适当地结合硬件和软件两者的组合进行实施。 [0118] As mentioned, the various techniques described herein may be implemented in connection with hardware or software, or a combination of appropriate combination of both hardware and software embodiments. 如文中所使用的,术语“组件”、“模块”、“系统”等同样意在表示与计算机相关实体,硬件、硬件和软件的组合、软件、或者执行中的软件。 As used herein, the terms "component," "module," "system" is intended to mean the same computer-related entity, either hardware, a combination of hardware and software, software, or software in execution. 例如,组件可以是但不限于在处理器上运行的进程、处理器、对象、可执行文件、执行的线程、程序、和/ 或计算机。 For example, a component may be, but is not limited to, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and / or a computer. 通过举例,计算机和计算机上运行的应用可以是组件。 By way of example, an application running on computer and the computer can be a component. 一个或更多个组件可以驻留在进程和/或执行线程中,并且组件可以被局部化到一台计算机上和/或分布在两台或更多台计算机之间。 One or more components may reside within a process and / or thread of execution and a component may be localized on one computer and / or distributed between two or more computers. [0119] 已经参照若干个组件之间的交互描述了上述系统。 [0119] The aforementioned systems have been described interaction between several components drawings. 可以理解的是,这样的系统和组件可以包括这些组件或指定的子组件、指定的组件或子组件中的一些、和/或附加的组件、以及依据前述的各种置换与组合。 It will be appreciated that such systems and components can include those components or sub-components specified components or sub-components specified in number, and / or additional components, and according to various permutations and combinations of the foregoing. 子组件也可以被实施为通信地耦接到其他组件的组件,而不是包含在父组件(分层)中的组件。 Subcomponents can also be implemented as components communicatively coupled to other components rather than contained within parent components (hierarchical) was added. 此外,要注意,一个或更多个组件可以组合到提供聚合功能的单个组件中,或被分成若干个单独的子组件,并且还要注意,可以提供任意一个或更多个中间层(如管理层)来通信地耦接至这样的子组件,从而提供集成功能。 Furthermore, it is noted that one or more components may be combined into a single component providing aggregate functionality or divided into several separate sub-components, and also noted that may provide any one or more intermediate layers (e.g., Management layer) be communicatively coupled to such sub-components in order to provide integrated functionality. 文中所描述的任何元件也可以与文中没有具体描述但是本领域的技术人员通常已知的一个或更多个其他组件进行交互。 Any elements described herein may not be specifically described herein and those skilled in the art generally known to one or more other components interact. [0120] 鉴于文中所描述的示例性系统,可以根据所描述的主题来实施的方法也可以参照各个图的流程图进行理解。 [0120] In view of the exemplary systems described herein, a method according to embodiments described subject matter may also be appreciated that various figures flowchart reference. 尽管为了简化说明的目的,方法被示出和描述为一系列的块,但是应理解和意识到,各个实施例并不受限于块的顺序,因为一些块可以以不同的顺序发生和/或与文中所描绘和描述的其他块同时发生。 While for purposes of simplicity of explanation, the methodologies are shown and described as a series of blocks, it is to be understood and appreciated that the various embodiments are not limited by the order of the blocks, as some blocks may occur in different orders and / or depicted and described herein with other blocks simultaneously. 在非连续的或分支的情况下,通过流程图示出了流程,可以理解,可以实施实现相同或相似结果的各种其他的分支、流程路径、以及块的顺序。 In the case of discontinuous or branched, flow is illustrated by the flow, be appreciated that the sequence can be implemented which achieve the same or similar result that various other branches, flow paths, and blocks. 此外,一些示出的块在实施下文中描述的方法中是可选的。 In addition, some blocks of the method shown in the embodiment described below is optional. 结论[0121] 尽管本发明容易进行各种修改和替选构造,但是在附图中示出了本发明的一些说明性实施例并且在上面对其进行了详细描述。 Conclusion [0121] While the invention is susceptible to various modifications and alternative constructions, is shown in the drawings certain illustrative embodiments of the present invention and the above embodiments are discussed in detail below. 然而,应该理解,不意在将本发明限制于所公开的具体形式,相反,意在涵盖落在本发明的精神和范围内的所有修改、替选构造和等同物。 However, it should be understood that the present invention is not intended to be limited to the particular forms disclosed, contrary, is intended to cover all modifications falling within the spirit and scope of the present invention, and alternative constructions, and equivalents thereof. [0122] 除了文中所描述的各种实施例之外,要理解的是,可以使用其他类似的实施例,或可以对所描述的实施例进行修改和添加,以执行相应的实施例的相同或等同功能,而不偏离所描述的实施例。 [0122] In addition to the various embodiments herein described embodiment, to be understood that other similar embodiments may be used embodiments, or of the described embodiments may be modified and added to perform the same as the corresponding embodiments or functionally equivalent, without departing from the embodiments described. 更进一步地,多个处理芯片或多个设备可以共享文中所描述的一个或更多个功能的执行,类似地,可以跨越多个设备来实现存储。 Still further, multiple processing chip or multiple devices can share or perform more functions of a herein described, similarly, may span multiple storage devices to achieve. 因此,本发明不限于任何单一的实施例,而是以与所附权利要求一致的广度、精神和范围进行解释。 Accordingly, the present invention is not limited to any single embodiment, but is consistent with the breadth of the appended claims, the spirit and scope of interpretation.

Claims (10)

1. 一种计算环境中的、至少部分地在至少ー个处理器上执行的方法,包括应用用于将ー个或更多个搜索查询术语映射(204)到文档相关数据的翻译模型(116),所述应用包括:处理包含对应于词对齐的查询-文档对(114)的数据的所述翻译模型(116);将所述翻译模型(116)结合(408)到信息检索模型(106)中;以及响应于搜索查询而使用(410)所述信息检索模型(106)来产生包含相关文档的搜索結果。 A computing environment, at least in part, at least one ー method performed on a processor, the application comprising means for ー or more search query terms is mapped (204) to the document related data translation model (116 ), the application comprising: a handle comprising a corresponding word alignment query - the translation of the document data model (114) (116); the translation model (116) in combination (408) to information retrieval model (106 ); and in response to a search query using (410) said information retrieval model (106) comprising generating a search result related documents.
2.根据权利要求1所述的方法,其中,处理所述翻译模型还包括:处理搜索引擎使用数据以识别词对齐的查询-文档对,以便使用与每个查询-文档对相关联的后验分布和似然性分布来训练所述翻译模型。 2. The method according to claim 1, wherein said translation model process further comprising: processing data to identify a search engine query word alignment - of the document, each query to use - for the posterior documents associated distribution and likelihood distributions training the translation model.
3.根据权利要求1所述的方法,其中,处理所述翻译模型还包括估计表示捜索查询子语言与文档子语言之间的语义关系的翻译概率,其中,估计所述翻译概率还包括以下至少ー个:调节自翻译概率或计算广告的查询翻译概率。 3. The method according to claim 1, wherein said process further comprises a translation model probability estimate representation translation semantic relationship between the sub-query search Dissatisfied language document sublanguage, wherein said estimating further comprises at least a translation probabilityー months: query translation adjustment from translation probability or probability calculation advertising.
4.根据权利要求1所述的方法,还包括:生成与广告相关联的元数据流或建议关键词中的至少ー个。 4. The method according to claim 1, further comprising: generating at least one ad ー stream associated metadata or keywords in the recommendations.
5.根据权利要求1所述的方法,还包括以下至少ー个:基于所述捜索查询计算每个潜在文档的相关性得分或基于所述捜索结果计算每个相关文档的点击预测得分。 5. The method according to claim 1, further comprising at least one ー: Dissatisfied with the relevance score is calculated for each potential query document index is calculated for each of the relevant documents based on the search result Dissatisfied click or prediction based on the scores.
6. 一种计算环境中的系统,包括训练机构(104),所述训练机构被配置成处理词对齐训练语料库(114)以及识别(406)查询-广告双短语,其中所述训练机构(104)还被配置成计算(208)与所述查询-广告双短语相关联的短语翻译概率,以产生(406)针对广告的基于短语的查询翻译概率并将所述基于短语的查询翻译概率提供(408)给搜索引擎。 6. A system in a computing environment, comprising training means (104), said mechanism is configured to process the training word aligned training corpus (114) and identifying (406) a query - bis advertising phrases, wherein the training means (104 ) is further configured to calculate (208) the query - bis phrase translation probability advertising phrase associated, to generate (406) a query for advertising and phrase-based translation probability providing the query phrase-based translation probability ( 408) to the search engine.
7.根据权利要求6所述的系统,其中,所述搜索引擎还包括排名机构,所述排名机构被配置成根据所述基于短语的查询翻译概率来计算给定搜索查询情况下的每个广告的得分,其中,所述排名机构还被配置成以下功能中的至少ー个:基于所述广告的ー组得分来过滤所述搜索查询的捜索结果或计算包括所述短语翻译概率的短语翻译模型的特征信息。 7. The system according to claim 6, wherein said search engine further comprises a ranking mechanism, the mechanism is configured according to rank based on the query phrase translation probability is calculated for each ad for a given search query where score, wherein said mechanism is further configured to rank the following features at least one ー: the phrase translation model filtering the search query based on the search result ー Dissatisfied group score of the advertisement or the phrase translation probability comprises computing the It features information.
8.根据权利要求6所述的系统,其中,所述系统还包括建议机构,所述建议机构被配置成产生用于使与广告客户相关联的ー组广告的收益最大化的策略。 8. The system according to claim 6, wherein said system further comprises a recommendation mechanism configured to generate a policy mechanism is set ー advertising for the advertiser to maximize the benefits associated with the recommendations.
9.具有计算机可执行指令的一个或更多个计算机可读介质,所述计算机可执行指令在被执行时进行包括以下步骤: 访问(302)翻译模型(116),所述翻译模型捕获搜索查询部分与广告部分之间的语义相似度; 将搜索查询映射(304、306)到一个或更多个相关广告; 基于所述翻译模型(116)对所述一个或更多个相关广告进行排名(308、310);以及产生(410)捜索结果,所述搜索结果包括具有针对所述搜索查询的排名的顺序的所述一个或更多个相关广告。 9. One or more computer-executable instructions having computer readable medium, the computer-executable instructions comprising the following steps when executed: accessing (302) translation model (116), the acquisition search query translation model semantic similarity between the portion of the advertising portion; map search query (304, 306) to one or more ads; ranking the one or more ads based on the translation model (116) ( 308, 310); and generating (410) Dissatisfied search result, the search results include the one or more ads for having an order of ranking of the search query.
10.根据权利要求10所述的ー个或更多个计算机可读介质,还具有这样的计算机可执行指令,所述指令包括: 生成用于基于对齐模板对所述一个或更多个相关文档进行排名的基于短语的特征信息。 According to claim 10 ー one or more computer-readable medium of claim further has computer-executable instructions, the instructions comprising: generating a template based on the alignment of the one or more related documents rank information based on the characteristic phrase.
CN2012104134805A 2011-10-25 2012-10-25 Search query and document-related data translation CN103049474A (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
US201161551363P true 2011-10-25 2011-10-25
US61/551,363 2011-10-25
US13/328,924 2011-12-16
US13/328,924 US9501759B2 (en) 2011-10-25 2011-12-16 Search query and document-related data translation

Publications (1)

Publication Number Publication Date
CN103049474A true CN103049474A (en) 2013-04-17

Family

ID=48062115

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2012104134805A CN103049474A (en) 2011-10-25 2012-10-25 Search query and document-related data translation

Country Status (1)

Country Link
CN (1) CN103049474A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104750713A (en) * 2013-12-27 2015-07-01 阿里巴巴集团控股有限公司 Method and device for sorting search results
CN105095385A (en) * 2015-06-30 2015-11-25 百度在线网络技术(北京)有限公司 Method and device for outputting retrieval result
CN105335391A (en) * 2014-07-09 2016-02-17 阿里巴巴集团控股有限公司 Processing method and device of search request on the basis of search engine
CN105912563A (en) * 2016-03-23 2016-08-31 北京数字跃动科技有限公司 Method of giving machines artificial intelligence learning based on knowledge of psychology
CN106663124A (en) * 2014-08-11 2017-05-10 微软技术许可有限责任公司 Generating and using a knowledge-enhanced model

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050080776A1 (en) * 2003-08-21 2005-04-14 Matthew Colledge Internet searching using semantic disambiguation and expansion
US20050228797A1 (en) * 2003-12-31 2005-10-13 Ross Koningstein Suggesting and/or providing targeting criteria for advertisements
US20090228353A1 (en) * 2008-03-05 2009-09-10 Microsoft Corporation Query classification based on query click logs
US7912868B2 (en) * 2000-05-02 2011-03-22 Textwise Llc Advertisement placement method and system using semantic analysis

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7912868B2 (en) * 2000-05-02 2011-03-22 Textwise Llc Advertisement placement method and system using semantic analysis
US20050080776A1 (en) * 2003-08-21 2005-04-14 Matthew Colledge Internet searching using semantic disambiguation and expansion
US20050228797A1 (en) * 2003-12-31 2005-10-13 Ross Koningstein Suggesting and/or providing targeting criteria for advertisements
US20090228353A1 (en) * 2008-03-05 2009-09-10 Microsoft Corporation Query classification based on query click logs

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104750713A (en) * 2013-12-27 2015-07-01 阿里巴巴集团控股有限公司 Method and device for sorting search results
CN105335391A (en) * 2014-07-09 2016-02-17 阿里巴巴集团控股有限公司 Processing method and device of search request on the basis of search engine
CN105335391B (en) * 2014-07-09 2019-02-15 阿里巴巴集团控股有限公司 The treating method and apparatus of searching request based on search engine
CN106663124A (en) * 2014-08-11 2017-05-10 微软技术许可有限责任公司 Generating and using a knowledge-enhanced model
CN105095385A (en) * 2015-06-30 2015-11-25 百度在线网络技术(北京)有限公司 Method and device for outputting retrieval result
CN105095385B (en) * 2015-06-30 2018-11-13 百度在线网络技术(北京)有限公司 A kind of output method and device of retrieval result
CN105912563A (en) * 2016-03-23 2016-08-31 北京数字跃动科技有限公司 Method of giving machines artificial intelligence learning based on knowledge of psychology
CN105912563B (en) * 2016-03-23 2019-04-02 北京数字跃动科技有限公司 A method of the artificial intelligence learning of machine is assigned based on psychological knowledge

Similar Documents

Publication Publication Date Title
White et al. Predicting user interests from contextual information
He et al. Trirank: Review-aware explainable recommendation by modeling aspects
JP3389948B2 (en) Display ad selection system
JP4977624B2 (en) Matching and ranking of sponsored search listings that incorporate web search technology and web content
Yang et al. Like like alike: joint friendship and interest propagation in social networks
Liu et al. Identifying helpful online reviews: a product designer’s perspective
CN101025737B (en) Attention degree based same source information search engine aggregation display method
Phan et al. A hidden topic-based framework toward building applications with short web documents
US7921107B2 (en) System for generating query suggestions using a network of users and advertisers
JP5281405B2 (en) Selecting high-quality reviews for display
US20100293057A1 (en) Targeted advertisements based on user profiles and page profile
Broder et al. Online expansion of rare queries for sponsored search
US20140229280A1 (en) Systems and methods for targeted advertising
US20090240674A1 (en) Search Engine Optimization
US20140046776A1 (en) Recommendation Systems and Methods Using Interest Correlation
US20060155751A1 (en) System and method for document analysis, processing and information extraction
KR101700352B1 (en) Generating improved document classification data using historical search results
US8156120B2 (en) Information retrieval using user-generated metadata
JP4838529B2 (en) Enhanced clustering of multi-type data objects for search term proposal
KR101506380B1 (en) Infinite browse
US20070214133A1 (en) Methods for filtering data and filling in missing data using nonlinear inference
US8260664B2 (en) Semantic advertising selection from lateral concepts and topics
US8977612B1 (en) Generating a related set of documents for an initial set of documents
Zhao et al. Connecting social media to e-commerce: Cold-start product recommendation using microblogging information
CN101496003B (en) Compatibility scoring of users in a social network

Legal Events

Date Code Title Description
C06 Publication
C10 Entry into substantive examination
ASS Succession or assignment of patent right

Owner name: MICROSOFT TECHNOLOGY LICENSING LLC

Free format text: FORMER OWNER: MICROSOFT CORP.

Effective date: 20150608

C41 Transfer of patent application or patent right or utility model
C02 Deemed withdrawal of patent application after publication (patent law 2001)