CN102982025A - Identification method and device for searching requirement - Google Patents

Identification method and device for searching requirement Download PDF

Info

Publication number
CN102982025A
CN102982025A CN2011102588353A CN201110258835A CN102982025A CN 102982025 A CN102982025 A CN 102982025A CN 2011102588353 A CN2011102588353 A CN 2011102588353A CN 201110258835 A CN201110258835 A CN 201110258835A CN 102982025 A CN102982025 A CN 102982025A
Authority
CN
China
Prior art keywords
translation
search
keyword
user
unit
Prior art date
Application number
CN2011102588353A
Other languages
Chinese (zh)
Other versions
CN102982025B (en
Inventor
蓝翔
柴春光
吴华
Original Assignee
北京百度网讯科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京百度网讯科技有限公司 filed Critical 北京百度网讯科技有限公司
Priority to CN201110258835.3A priority Critical patent/CN102982025B/en
Publication of CN102982025A publication Critical patent/CN102982025A/en
Application granted granted Critical
Publication of CN102982025B publication Critical patent/CN102982025B/en

Links

Abstract

The invention discloses an identification method and a device for a searching requirement. The identification method for the searching requirement comprises that according to history action journal of a user, keywords used by the user in translation operation are obtained and appearance frequency of the keywords obtained is counted. When a searching requirement is received, whether the appearance frequency of the searching keywords in the searching requirement surpassing a pre-set threshold value or not is judged according to a statistical result. If the appearance frequency surpasses the pre-set threshold value, a translation request of the searching requirement is confirmed. According to the identification method and the device for the searching requirement, the user is not required to input keywords such as ' translation' or 'what is the meaning' to indicate a translation requirement when searching, the identification method and the device for the searching requirement can directly confirm whether content input by the user needs to be translated or not and then output a translation result so that application range of translation requirement identification is enlarged and the users can use the identification method and the device further conveniently.

Description

一种搜索需求识别方法及装置 A method and apparatus for identifying search needs

技术领域 FIELD

[0001] 本申请涉及互联网应用技术领域,特别是涉及一种搜索需求识别方法及装置。 [0001] The present application relates to use of Internet technologies, and particularly to a method and apparatus for identifying search needs.

背景技术 Background technique

[0002] 搜索引擎(search engine)是指根据一定的策略、运用特定的计算机程序从互联网上搜集信息,在对信息进行组织和处理后,为用户提供检索服务,将用户检索相关的信息展示给用户的系统。 [0002] Search engine (search engine) means according to a certain strategy, the use of specific computer program to gather information from the Internet, after the information is organized and processed to provide users with search services, users retrieve relevant information presented to the user's system. 传统的搜索引擎,在接收到用户的提交的搜索请求(query)后,首先提取该query所包含的关键词,然后基于文本内容匹配操作,将包含有该关键词的网页或文档返回给用户。 Traditional search engines, after receiving the search request (query) submitted by the user, the first query contains keywords are extracted, and text-based content matching operation, will contain the keyword web pages or documents returned to the user. 随着用户对搜索智能化要求的不断提升,搜索需求识别已经成为搜索领域的一个研究热点。 With the continuous improvement of user requirements for intelligent search, search needs identification has become a hot topic in the search field.

[0003] 所谓搜索需求识别,就是根据用户所提交的query,分析和预测用户的需求,确定用户的意图或感兴趣的领域,然后再向其提供相应的信息。 [0003] The so-called search demand identification is based on query submitted by the user, analyze and predict the user's needs, identify areas of the user's intent or interest, then again it provide the appropriate information. 例如,用户输入“从北京到上海”这样的query,则可以识别出该用户可能具有较强的地图查询需求或票务查询需求,这样就可以在展示搜索结果时,直接向用户提供地图或票务的相关内容,或者将地图或票务的相关内容排在搜索结果的前面,从而方便用户进一步浏览。 For example, the user inputs "from Beijing to Shanghai," this query, you can identify the user may have strong demand Map this ticketing inquiries or needs, so that you can display in the search results, provide a map or a ticket directly to the user related content, or the content of a map or a ticket in the front row of the search results, thereby further facilitate user browsing.

[0004] 搜索需求识别所涉及的关键技术包括语义分析、行为分析、智能人机交互、海量计算处理、信息抽取等等。 [0004] needs to identify key technologies involved in the search, including semantic analysis, behavioral analysis, intelligent human-computer interaction, massive computational processing, information extraction, and so on. 由于用户query表述方式的多样性,目前一种较为常用的方式是在不同的领域对用户的query进行分析,以实现更有针对性的搜索需求识别。 Due to the diversity of expression of a user query, a more common approach is to analyze current user's query in different areas, in order to achieve more targeted search needs identified.

[0005] 翻译需求是用户在搜索过程中一种较为常见的需求,根据现有技术,当用户输入“XXX翻译”或“XXX是什么意思”这样的query后,搜索引擎可以根据“翻译”或“是什么意思”等明显具有翻译需求的表述,较好地识别出用户具有针对单词“xxx”的翻译需求。 [0005] translate user requirements in the search process is a more common requirement, according to the prior art, when a user enters "XXX translation" or "XXX What do you mean," after such a query, the search engine can "translate" or "What do you mean," and other expressions clearly has translation needs, to better identify translation needs of a user with the word "xxx" is. 但是在实际应用中,用户的query中可能仅包括一个单词或短语,而不包括“翻译”或“是什么意思”等有翻译需求的表述,在这种情况下,现有的搜索引擎还不能够很好地确定用户当前是否具有翻译需求。 In practice, however, the user's query might include only a word or phrase, not including the "translation" or "mean" and have expressed translation needs, in this case, the existing search engines are not It can well determine whether the user's current translation needs.

发明内容 SUMMARY

[0006] 为解决上述技术问题,本申请实施例提供种一种搜索需求识别方法及装置,以实现对用户翻译需求的更有效识别,技术方案如下: [0006] To solve the above problems, the present embodiment provides a kind of searching needs recognition method and apparatus, in order to achieve more efficient translation needs of the user identification, the technical solutions are as follows:

[0007] 本申请实施例提供一种搜索需求识别方法,包括: Embodiment [0007] The present application provides a search needs recognition method, comprising:

[0008] 根据用户的历史行为日志,获取用户在执行翻译操作时所使用的关键词; [0008] According to the historical behavior of the user's log for keyword the user in performing translation operations used;

[0009] 对所获取的关键词的出现频率进行统计; [0009] to obtain the frequency of occurrence of the keyword statistics;

[0010] 接收到搜索请求后,根据统计结果判断该搜索请求中搜索关键词的出现频率是否超过预设的阈值,如果是,则确定该搜索请求具有翻译需求。 [0010] Upon receiving the search request, according to the statistics of the search request determines whether the search keyword occurrence frequency exceeds a preset threshold, if yes, determining that the search request has translation requirements.

[0011] 根据本申请的一种实施方式,所述获取用户在执行翻译操作时所使用的关键词,包括: [0011] According to an embodiment of the present application, the keyword acquiring user performs translation operations used, comprising:

[0012] 如果用户在搜索引擎所给出的搜索结果中,选择了可提供翻译服务的搜索结果,则获取用户本次搜索所使用的关键词。 [0012] If the user is in the search engine results given, select the search results provide translation services, obtaining this search keywords users are using.

[0013] 根据本申请的一种实施方式,所述获取用户在执行翻译操作时所使用的关键词,包括: [0013] According to an embodiment of the present application, the keyword acquiring user performs translation operations used, comprising:

[0014] 如果根据用户输入的搜索请求,可以明确判断出本次搜索具有翻译需求,则获取本次搜索具有翻译需求部分的关键词。 [0014] If the search request according to a user input, can be clearly judged that this search has translation requirements, is acquired with this search keyword translation needs portion.

[0015] 根据本申请的一种实施方式,所述获取用户在执行翻译操作时所使用的关键词,包括: [0015] According to an embodiment of the present application, the keyword acquiring user performs translation operations used, comprising:

[0016] 获取用户在翻译类产品中所输入的关键词。 [0016] Gets keywords users in the translation products as inputs.

[0017] 根据本申请的一种实施方式,所述对所获取的关键词的出现频率进行统计,包括: [0017] According to an embodiment of the present disclosure, the statistical frequency of occurrence of the acquired keyword, comprising:

[0018] 利用n-gram模型,对所获取的关键词中出现的每个n-gram单元的频率进行统计。 [0018] using n-gram model, the frequency of each n-gram units keywords appearing in the acquired statistics.

[0019] 根据本申请的一种实施方式,所述接收到搜索请求后,根据统计结果判断该搜索请求中搜索关键词的出现频率是否超过预设的阈值,包括: After [0019] According to one embodiment of the present application, the search request is received, it determines whether the search keyword search request occurrence frequency exceeds a preset threshold value according to statistical results, comprising:

[0020] 根据统计结果,获得搜索关键词中每个n-gram单元的频率; [0020] According to statistics, the frequency obtained in each n-gram keyword search unit;

[0021] 判断搜索关键词中每个n-gram单元的频率值之和是否超过预设的阈值。 [0021] Analyzing a keyword search unit for each n-gram frequency value and exceeds a preset threshold value.

[0022] 根据本申请的一种实施方式,在对所获取的关键词的出现频率进行统计之前,还包括: [0022] According to one embodiment of the present disclosure, prior to the occurrence frequency of the keyword of the acquired statistics, further comprising:

[0023] 对所获取的关键词进行词形还原处理和/或去除停用词处理。 [0023] The keyword is acquired Lemmatization treatment and / or treated to remove stop words.

[0024] 根据本申请的一种实施方式,在判断搜索请求中搜索关键词的出现频率是否超过预设的阈值之前,还包括: Before [0024] According to one embodiment of the present disclosure, it is determined whether or not a search keyword search request occurrence frequency exceeds a predetermined threshold value, further comprising:

[0025] 对搜索请求中的搜索关键词进行词形还原处理和/或去除停用词处理。 [0025] The search request for a search keyword Lemmatization treatment and / or treated to remove stop words.

[0026] 根据本申请的一种实施方式,在确定搜索请求具有翻译需求后,还包括对搜索请求对应的翻译结果进行展现,所述翻译结果的展现方法包括: After [0026] According to one embodiment of the present disclosure, in determining a search request with translation requirements, further comprising a translation result corresponding to the search request to unfold, the translation result presentation method comprising:

[0027] 在搜索框中,展现搜索请求所对应的翻译结果;或 [0027] In the search box, show the translation result corresponding to the search request; or

[0028] 将搜索请求所对应的翻译结果以搜索建议的形式进行展现。 [0028] The translation results corresponding to the search request be presented in a search form recommendations.

[0029] 根据本申请的一种实施方式,在接收到搜索请求并生成搜索建议之后,还包括: [0029] According to one embodiment of the present disclosure, after receiving the search request and generates a search suggestions, further comprising:

[0030] 判断搜索建议的内容是否具有翻译需求。 [0030] determine whether the contents of search suggestions have translation needs.

[0031] 本申请实施例还提供一种搜索需求识别装置,包括: [0031] Example embodiments of the present application further provides a search needs recognition apparatus, comprising:

[0032] 翻译关键词获取单元,用于根据用户的历史行为日志,获取用户在执行翻译操作时所使用的关键词; [0032] Translation keyword obtaining means for logging the behavior of a user based on the history, the user performs keyword acquiring translation operations used;

[0033] 翻译关键词统计单元,用于对所获取的关键词的出现频率进行统计; [0033] Translation keyword statistics unit for the frequency of occurrence of the keyword of the acquired statistics;

[0034] 翻译需求识别单元,用于接收到搜索请求后,根据统计结果判断该搜索请求中搜索关键词的出现频率是否超过预设的阈值,如果是,则确定该搜索请求具有翻译需求。 After [0034] The translation needs identification unit for receiving the search request, the search request is determined whether the search keyword appearance frequency exceeds a preset threshold according to statistics, if so, determining that the search request has translation requirements.

[0035] 根据本申请的一种实施方式,所述翻译关键词获取单元,具体配置为: [0035] According to one embodiment of the present application, the translated keyword obtaining unit is specifically configured to:

[0036] 用于在用户在搜索引擎所给出的搜索结果中,选择了可提供翻译服务的搜索结果的情况下,获取用户本次搜索所使用的关键词。 [0036] for the user in the search engine results given in the selected search result may provide a translation service, the user acquires keywords used in this search.

[0037] 根据本申请的一种实施方式,所述翻译关键词获取单元,具体配置为: [0037] According to one embodiment of the present application, the translated keyword obtaining unit is specifically configured to:

[0038] 用于在根据用户输入的搜索请求,可以明确判断出本次搜索具有翻译需求的情况下,获取本次搜索具有翻译需求部分的关键词。 [0038] In the case according to the search request input by the user, can be clearly judged that this search has translation needs, this search has acquired keyword translation needs portion. [0039] 根据本申请的一种实施方式,所述翻译关键词获取单元,具体配置为: [0039] According to one embodiment of the present application, the translated keyword obtaining unit is specifically configured to:

[0040] 用于获取用户在翻译类产品中所输入的关键词。 [0040] for obtaining a user keyword translation products as entered.

[0041] 根据本申请的一种实施方式,所述翻译关键词统计单元,具体配置为: [0041] According to one embodiment of the present disclosure, a statistical keyword translation unit is specifically configured to:

[0042] 用于利用n-gram模型,对所获取的关键词中出现的每个n-gram单元的频率进行统计。 [0042] using n-gram model is used, each n-gram frequency element keywords appearing in the acquired statistics.

[0043] 根据本申请的一种实施方式,所述翻译需求识别单元,具体配置为: [0043] According to one embodiment of the present application, identifying the needs of the translation unit is specifically configured to:

[0044] 用于根据统计结果,获得搜索关键词中每个n-gram单元的频率; [0044] According to statistics, the frequency obtained in each n-gram keyword search unit;

[0045] 判断搜索关键词中每个n-gram单元的频率值之和是否超过预设的阈值。 [0045] Analyzing a keyword search unit for each n-gram frequency value and exceeds a preset threshold value.

[0046] 根据本申请的一种实施方式,该装置还包括: [0046] According to an embodiment of the present disclosure, the apparatus further comprising:

[0047] 翻译关键词预处理单元,用于在所述翻译关键词统计单元对所获取的关键词的出现频率进行统计之前,对所获取的关键词进行词形还原处理和/或去除停用词处理。 [0047] Image pre-translation means for translation in frequency of occurrence of the keyword of keyword statistics unit acquired before the statistics on the acquired keyword is Lemmatization treatment and / or removal of deactivation word processing.

[0048] 根据本申请的一种实施方式,该装置还包括: [0048] According to an embodiment of the present disclosure, the apparatus further comprising:

[0049] 搜索关键词预处理单元,用于在所述翻译关键词统计单元判断搜索请求中搜索关键词的出现频率是否超过预设的阈值之前,对搜索请求中的搜索关键词进行词形还原处理和/或去除停用词处理。 Before [0049] The preprocessing unit keyword search, keyword for translation in the statistics unit determines whether or not a search keyword search request occurrence frequency exceeds a predetermined threshold value, a search request for a search keyword Lemmatization treatment and / or removal of stop word processing.

[0050] 根据本申请的一种实施方式,该装置还包括: [0050] According to an embodiment of the present disclosure, the apparatus further comprising:

[0051] 翻译结果展现单元,用于在所述翻译需求识别单元确定搜索请求具有翻译需求后,对搜索请求对应的翻译结果进行展现,所述翻译结果展现单元具体配置为: [0051] The results show the translation means for translation of the result of translation needs identifying unit determines a search request having a translation requirements, for the search request corresponding to unfold, the translation result presentation unit is specifically configured to:

[0052] 用于在搜索框中,展现搜索请求所对应的翻译结果;或 [0052] for the search box to show the translation results corresponding to the search request; or

[0053] 将搜索请求所对应的翻译结果以搜索建议的形式进行展现。 [0053] The translation results corresponding to the search request be presented in a search form recommendations.

[0054] 根据本申请的一种实施方式,所述翻译需求识别单元还用于在接收到搜索请求并生成搜索建议之后,判断搜索建议的内容是否具有翻译需求。 [0054] According to an embodiment of the present application, identifying the needs of the translation unit is further configured to, after receiving the search request and generates a search suggestions, determining whether the content search suggestions translation requirements.

[0055] 本申请实施例所提供的方案,首先从大量用户的历史行为日志中获取用户在执行与翻译相关操作时所使用的关键词,并对这些关键词的出现频率进行统计。 [0055] This application solution provided the implementation, first of all get keyword user when performing operations associated with the translation used by the large number of user behavior history log, and the frequency of occurrence of these keywords statistics. 在统计结果中,词的出现频率越高,说明用户对这些词的翻译需求越强。 In statistics, the higher the frequency of words appear, the stronger user demand for translation of these words. 进而,如果用户在搜索过程中,使用的搜索关键词的出现频率达到一定要求,则可以判断该用户本次的搜索行为具有翻译需求。 Furthermore, if the user in the search process, use the keyword search frequency of occurrence of certain requirements, you can determine the user's search behavior has this translation needs.

[0056] 应用本申请实施例所提供的方案,可以不要求用户在搜索时输入“翻译”或“是什么意思”等明确表示翻译需求的关键词,直接确定用户所输入的内容是否具有翻译需求并给出翻译结果,从而提高了翻译需求识别的应用范围,并且进一步方便了用户的使用。 Solution provided in the [0056] application of the present application, may not require user input when searching for "translation" or "mean" and made it clear that keyword translation needs, determine the contents directly entered by the user whether the translation needs and gives the translation results, thereby increasing the scope of application of translation needs identified, and further facilitate the user's use.

附图说明 BRIEF DESCRIPTION

[0057]为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请中记载的一些实施例,对于本领域普通技术人员来讲,还可以根据这些附图获得其他的附图。 [0057] In order to more clearly illustrate the technical solutions according to the prior art embodiment of the present application, briefly introduced hereinafter, embodiments are described below in the accompanying drawings or described in the prior art needed to be used in describing the embodiments the drawings are only some embodiments of the present application are described, those of ordinary skill in the art is concerned, it may derive from these drawings other drawings.

[0058] 图1为本申请实施例搜索需求识别方法的流程图; [0058] FIG. 1 is a flowchart of search needs to apply identification method embodiment;

[0059] 图2为本申请实施例所提供的一种翻译结果展现方式示意图; [0059] FIG. 2 embodiment of the present application provided by one kind of translation results show schematically embodiments;

[0060] 图3为本申请实施例所提供的第二种翻译结果展现方式示意图;[0061] 图4为本申请实施例所提供的第三种翻译结果展现方式示意图; [0060] FIG. 3 showing the results of the present application show a second embodiment provided by the translation example embodiments; [0061] FIG. 4 is a schematic view of the application of the third embodiment is provided to show the translation mode embodiment;

[0062] 图5为本申请实施例搜索需求识别装置的第一种结构示意图; [0062] FIG. 5 schematic structural diagram of a first embodiment of the search needs recognition apparatus embodiment of the present application;

[0063] 图6为本申请实施例搜索需求识别装置的第二种结构示意图; [0063] FIG. 6 is a schematic configuration example of a second search needs recognition apparatus embodiment of the present application;

[0064] 图7为本申请实施例搜索需求识别装置的第三种结构示意图。 [0064] FIG. 7 is a schematic structure of a third embodiment of the search needs recognition apparatus embodiment of the present application.

具体实施方式 Detailed ways

[0065] 在现有的搜索引擎中,当用户在搜索框输入一段文字,特别是输入外文时,用户可能是想得到包含有该文字内容的网页或文档,即普通搜索需求;也可能是想要查看与该文字内容相应的翻译或双语例句,即翻译需求。 [0065] In a conventional search engine, when a user in the search box to enter a text, especially foreign language input, the user could be desired web pages or documents that contain text content, that ordinary search needs; it may want to view the text and the translation or bilingual sentence that translation needs. 对于搜索引擎而言,如果可以正确判断出用户当前的需求,则可以构建更为符合用户需求的搜索结果向用户展现,以方便用户浏览。 For search engines, if you can correctly determine the current needs of users, you can build search results more in line with user needs presented to the user, with user-friendly browsing.

[0066] 本申请实施例提供一种搜索需求识别方法,该方法包括以下步骤: Embodiment [0066] The present application provides an identification search needs, the method comprising the steps of:

[0067] 根据用户的历史行为日志,获取用户在执行翻译操作时所使用的关键词; [0067] According to the historical behavior of the user's log for keyword the user in performing translation operations used;

[0068] 对所获取的关键词的出现频率进行统计; [0068] to obtain the frequency of occurrence of the keyword statistics;

[0069] 接收到搜索请求后,根据统计结果判断该搜索请求中搜索关键词的出现频率是否超过预设的阈值,如果是,则确定该搜索请求具有翻译需求。 [0069] Upon receiving the search request, according to the statistics of the search request determines whether the search keyword occurrence frequency exceeds a preset threshold, if yes, determining that the search request has translation requirements.

[0070] 上述方法首先从大量用户的历史行为日志中获取用户在执行与翻译相关操作时所使用的关键词,并对这些关键词的出现频率进行统计。 [0070] The method first acquires the user keyword in the implementation and translation-related operations from the large number of users using the historical behavior logs, and the frequency of occurrence of these keywords statistics. 在统计结果中,词的出现频率越高,说明用户对这些词的翻译需求越强。 In statistics, the higher the frequency of words appear, the stronger user demand for translation of these words. 进而,如果用户在搜索过程中,使用的搜索关键词的出现频率达到一定要求,则可以判断该用户本次的搜索行为具有翻译需求。 Furthermore, if the user in the search process, use the keyword search frequency of occurrence of certain requirements, you can determine the user's search behavior has this translation needs. 应用上述方案,可以不要求用户在搜索时输入“翻译”或“是什么意思”等明确表示翻译需求的关键词,直接确定用户所输入的内容是否具有翻译需求并给出翻译结果,从而提高了翻译需求识别的应用范围,并且进一步方便了用户的使用。 Application of the above scheme, may not require user input when searching for "translation" or "mean" and made it clear that keyword translation needs, determine the contents directly entered by the user whether the translation needs and gives the translation results, thus improving translation needs to identify the scope of the application, and further facilitate the users to use.

[0071] 为了使本技术领域的人员更好地理解本申请中的技术方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。 [0071] In order to make those skilled in the art better understand the technical solution of the present application, in conjunction with the following drawings of the present application example embodiments, the technical solutions in the present application will be clearly and completely described, obviously, the described embodiments are merely part of embodiments of the present application, rather than all embodiments. 基于本申请中的实施例,本领域普通技术人员所获得的所有其他实施例,都应当属于本申请保护的范围。 Based on the embodiments of the present application, all other embodiments obtained by those of ordinary skill in the art, the scope of the present application shall fall protection.

[0072] 图1所示,为本申请实施例一种搜索需求识别方法的流程图,该方法可以包括以下步骤: [0072] As shown in FIG. 1, the present application flow chart of a method for identifying search needs embodiment, the method may comprise the steps of:

[0073] S101,根据用户的历史行为日志,获取用户在执行翻译操作时所使用的关键词; [0073] S101, based on the historical behavior of the user log for keyword the user in performing translation operations used;

[0074] 本申请实施例方案,是基于用户的行为的历史数据,对用户的曾经明确进行过翻译操作的关键词进行统计,作为识别翻译需求的依据。 Example program [0074] The present application is based on historical data of the user's behavior, the user had clearly been translated keyword operating statistics as the basis for identifying translation needs. 对于使用搜索引擎的每一名用户,系统都会记录用户的各种行为,并将这些行为记录在用户日志中。 For every user search engine, the system will record the user's various actions and these actions recorded in the user log. 用户常见的翻译操作可以包括以下几种: Users common translations operation may include the following:

[0075] I)用户在搜索引擎所给出的搜索结果中,选择了可提供翻译服务的搜索结果。 [0075] I) users in search results given by the search engine, select the search results provide translation services.

[0076]当用户在搜索引擎输入一段文字,搜索引擎返回相应的搜索结果,其中,有些搜索结果是可以提供翻译服务的,例如翻译类网站。 [0076] When a user enters a text search engine, the search engine returns the search results, where some of the search results can provide translation services, such as translation websites. 如果用户进一步点选了这一类翻译结果,则对用户在搜索框中输入的文字进行记录。 If you click on the further results of this type of translation, the text for the user in the search box recorded.

[0077] 例如用户在搜索引擎中输入了query :“patent”,然后用户在搜索结果页中点击了翻译类站点的链接(比如WWW.1ciba. com, diet, youdao. com等),此时可以认为用户输入的这个query是有翻译需求的,于是将该query :“patent”记录下来。 [0077] For example the user enters in a search engine query: "patent", then the user clicks on the link translation class sites (such as WWW.1ciba com, diet, youdao com, etc...) In the search results page, then you can this query entered by the user believes there is a demand for translation, then the query: "patent" recorded. 而如果用户输入query后没有点击翻译类网站,比如用户输入“iphone”,然后点击了一个购物网站,则认为该query没有翻译需求,不对该query进行记录。 And if the user does not click input query translation websites, such as user input "iphone", then click on a shopping site, it is considered that the query needs no translation, no record of the query.

[0078] 2)根据用户输入的搜索请求,可以明确判断出本次搜索具有翻译需求。 [0078] 2) according to the search request input by the user, it can be clearly judged that this search has translation requirements.

[0079] 根据现有的翻译需求识别技术,当用户所输入的query中包含明显具有翻译需求的表述时,可以认为用户本次搜索具有翻译需求,此时将对query中有翻译需求的文字部分进行记录。 When [0079] The conventional translation needs recognition technology, when the query input by the user has clearly expressed contains translation needs to be considered that the user having this search translation needs, this case will have the text portion of the query translation needs recorded.

[0080] 例如,用户在搜索引擎中输入了query :“patent翻译”,搜索引擎可以根据“翻译”这个明显具有翻译需求的表述确定用户本次搜索具有翻译需求,于是将query中明显具有翻译需求的表述部分去除,仅对剩下的部分“patent”进行记录。 [0080] For example, the user inputs a query in a search engine: "patent Translation", the search engine can "translate" This clearly has expressed translation needs to determine the user that the search for a translation needs, then the query is evident with translation requirements the presentation is partially removed, and only the remaining part of the "patent" is recorded.

[0081] 再比如,用户在搜索引擎中输入了query :“patent是什么意思”,搜索引擎可以根据“是什么意思”这个明显具有翻译需求的表述确定用户本次搜索具有翻译需求,于是将query中的“是什么意思”去除,仅对剩下的部分“patent”进行记录。 [0081] As another example, the user enters in the search engine the query: "patent What do you mean," the search engine can determine a user that the search for a translation needs according to "mean" this clearly has expressed translation needs, so the query the "mean" to remove, only the remaining portion of the "patent" is recorded.

[0082] 3)用户使用搜索引擎之外的其他翻译类产品。 [0082] 3) the user to use other translation products outside of search engines.

[0083] 除了从搜索引擎获取用户进行翻译操作时所使用的关键词,还可以从其他的翻译类产品中,获取用户进行翻译操作时所使用的关键词。 [0083] In addition to keyword search engine operation when translated from users get used, but also from other translation products, gets the user to key words used in the translation operation. 例如,对于百度系统而言,除了提供基本的搜索引擎之外,同时还提供其他直接翻译服务的产品,如百度翻译(fany1. baidu.com)、百度词典(diet, baidu. com)等等,而用户在这些产品中输入的文字显然是具有翻译需求的。 For example, for Baidu systems, in addition to providing other than basic search engine, while also providing other direct translation service products such as Baidu translation (fany1. Baidu.com), Baidu dictionary (diet, baidu. Com) and so on, the text entered by the user in these products clearly have translation needs. 因此,只要能够通过某种途径,获得用户在其他翻译类产品中输入的内容,就可以将这些内容记录下来,作为后续搜索引擎识别翻译需求的依据。 Therefore, as long as in some way, to get user input in other translation products, the content can be recorded as a follow-up search engines identify translation needs basis.

[0084] 当用户进行上述几种翻译操作时,所输入的内容都可以认为具有明确的翻译需求,因此可以记录下来作为搜索引擎识别翻译需求的依据。 [0084] When the user performs the above operation several translated, the contents of input can be considered a clear translation needs, it can be recorded as the basis for a search engine to identify translation needs. 以上提供的几种获取用户在具有明确翻译需求时所使用的关键词的方法,可以分别使用,也相互结合使用,当然,本领域技术人员也可以根据实际的应用需求,采用其他方式获取用户具有明确翻译需求时所使用的关键词,这些并不影响本申请实施例方案的实现。 Keyword acquisition method described above provide several users with a clear translation needs when used, may be used separately, or in combination with each other, of course, those skilled in the art may also be based on the actual application requirements, other ways to obtain the user having keywords clear when the translation needs to be used, this does not affect the application implement solutions of the embodiments.

[0085] 此外,需要说明的是,本申请实施例方案是通过记录大量用户进行过翻译操作时所使用的关键词,作为识别翻译需求的依据。 [0085] Further, it is noted that the present embodiment is application embodiment keyword translation operation been used by recording a large number of users, as a basis for identifying translation needs. 因此在实际应用中,所记录的内容并不需要对应到某一名具体的用户。 Therefore, in practice, the recorded content does not need to correspond to a particular one of the user.

[0086] S102,对所获取的关键词的出现频率进行统计; [0086] S102, the frequency of occurrence of the acquired keyword statistics;

[0087] 在步骤S101,获取了大量的关键词,在本步骤中,对这些关键词出现的频率进行统计。 [0087] In step S101, the acquired a large number of keywords, in this step, the frequency of occurrence of these keywords statistics.

[0088] 在实际应用中,如果用户输入query的是单词或者短语,可以直接以单词或短语为单位,记录同样形式的单词或短语的出现次数。 [0088] In practice, if the user input is a query word or phrase, a word or phrase may be a direct unit, recording the number of occurrences of the same word or phrase in the form. 如果用户输入的query是句子,则可以先对句子进行分词,然后以每个分词结果为单位,统计出现的次数。 If a user enters a query sentence, you can be the first word of a sentence, and then the result of each word as a unit, the number of statistics appear. 当然,在实际应用中,除了出现次数,也可以用出现次数与总次数的比值或tf-1df值等其他形式来表示关键词的出现频率,本申请实施例对此并不需要进行限定。 Of course, in practical applications, in addition to the number of occurrences, can also occur in the form of a ratio or other value tf-1df frequency and the like and represented the total number of occurrence frequencies of keywords, the application of the present embodiment does not need to be defined.

[0089] 在本申请的优选实施方案中,对这些关键词出现的次数进行统计之前,还可以先进行如下的预处理操作: [0089] In a preferred embodiment of the present application, the number of times before the statistics of those words, but also may be performed first as preprocessing operations:

[0090] I)词形还原:[0091] 以英文为例,每个单词可能包含多种形态的变化,例如名词的单数/复数、动词的不同时态、形容词/副词变化等等,在实际处理过程中,可以将用户对同一单词不同形态的翻译需求归为一类处理,因此,可以先统一将单词的词形还原为原型(例如将runs、running、ran都还原为run),再进行统计。 [0090] I) Lemmatization: [0091] In English, for example, each word may contain varying various forms, such as different tenses a singular noun / plural, verb, adjective / adverb change, etc., in practice process, users can be translated demand for different forms of the same word classified as a class treatment, therefore, can first unify restore word form word prototype (for example, runs, running, ran are restored to run), then statistics. 也就是说,在搜索关键词中出现的任一种变形,在统计过程中都以原形进行处理。 In other words, any appearance of a variation in the keyword search, the prototype are processed in the statistical process.

[0092] 其中,词形还原可以利用现有技术如Porter Stemming实现,这里不再做详细说明。 [0092] wherein Lemmatization the prior art may be utilized such as Porter Stemming implemented, it will not be described in detail here.

[0093] 2)去除停用词: [0093] 2) the removal of stop words:

[0094] 停用词(Stop Words)大致可分为如下两类:一类是使用十分广泛,甚至是过于频繁的一些单词。 [0094] stop words (Stop Words) can be broadly divided into the following two categories: one is the use of very broad, even too often some of the words. 比如英文的“i”、“iS”、“what”,另一类是文本中出现频率很高,但实际意义又不大的词。 For example, the English "i", "iS", "what", the other is the high frequency of appearance in the text, but not the actual meaning of the word big. 这一类主要包括了语气助词、副词、介词、连词等,通常自身并无明确意义,只有将其放入一个完整的句子中才有一定的作用,如常见的“in”、“on”、“and”等等。 This category includes a modal particle, adverbs, prepositions, conjunctions, etc., there is no clear sense of itself usually only put it in a complete sentence have a role, such as the common "in", "on", "and", and so on.

[0095] 可见,对于停用词而言,也没有必要单独记录其出现的频率,因此可以先根据预置的停用词表,对步骤SlOl中获取的关键词进行去除停用词处理后,再进行统计。 [0095] See, for stop words, it is not necessary to separate the recording frequency of occurrence, it is possible according to a preset stop-list of keywords acquired in step SlOl after treatment to remove stop words, then statistics.

[0096] 根据实际的应用需求,上述两种优选的预处理方式可以分别使用,也可以结合使用,本申请实施例对此并不需要进行限制。 [0096] According to practical application requirements, the two preferred pretreatment may use, respectively, may be used in conjunction with this embodiment of the present application is not required to be limiting.

[0097] S103,接收到搜索请求后,根据统计结果判断该搜索请求中搜索关键词的出现频率是否超过预设的阈值,如果是,则确定该搜索请求具有翻译需求。 [0097] S103, after receiving the search request, the search request is determined whether the search keyword appearance frequency exceeds a preset threshold according to statistics, if so, determining that the search request has translation requirements.

[0098] 在步骤SlOl和S102,根据用户的历史行为,获得了若干具有翻译需求关键词,在本步骤中,当搜索引擎接收到新的搜索请求后,将根据搜索请求中搜索关键词的出现频率,确定该搜索请求是否具有翻译需求。 [0098] In step SlOl and S102, based on the historical behavior of the user, get a number in this step, when the search engine receives a new search request, there will be demand has translated keyword, the search request search by keyword frequency, determine whether the search request translation needs.

[0099] 对于阈值的设置方法,可以根据经验直接设定,也可以按照前述方法选择一批含有翻译需求的query,并同时选择另一批不含有翻译需求的query,两者的数量相近为宜。 [0099] The method for setting a threshold value can be directly set according to experience, can select a number of query translation needs containing as previously described, while selecting another batch and query does not contain a translation needs, preferably a similar number of both . 然后分别进行打分,选择一个使两类数据能够明显区分的数值作为阈值。 And then were scored, two types can be selected so that a clear distinction is numerical data as the threshold value.

[0100] 最简单的一种方式,是判断当前输入的关键词是否存在于具有翻译需求关键词中,如果是,则确定当前搜索请求具有翻译需求,这种方式相当于将阈值设定为O。 [0100] The simplest way is to determine whether the current input keyword exists in keywords having a translation requirements, and if so, determining the current search request having a translation needs, this embodiment corresponds to the threshold is set to O . 也可以将阈值设定为大于O的数值,也就是说,只有当前输入的关键词在统计结果中出现超过一定次数,才认为当前搜索请求具有翻译需求。 The threshold may also be set to a value greater than O, that is, more than a certain number of keywords the current input only appear in the statistics, only that the current search request with translation needs. 当然,本领域技术人员可以理解,根据实际需求,也可以设置多个不同的阈值范围,从而确定当前搜索请求的翻译需求强度。 Of course, those skilled in the art will appreciate, according to the actual demand, may also be provided a plurality of different threshold range to determine the translation requirements of the intensity of the current search request. 对于具有不同翻译需求强度的搜索请求,可以给予不同的处理方式,例如,对于具有更强翻译需求强度的搜索请求,可以将翻译结果排在搜索结果中更为靠前的位置。 For a search request having a different translation needs strength, different approaches may be administered, for example, a search request for a translation needs a stronger intensity, the translation may be ranked search results in a more forward position.

[0101] 类似S102,在实际应用中,如果用户输入query的是单词或者短语,可以直接以单词或短语为单位,与统计结果进行对比;如果用户输入的query是句子,则可以先对句子进行分词,然后以每个分词结果为单位,与统计结果进行对比,特别地,在当前query存在多个分词的情况下,可以以每个分词相应的统计频率求和,并与预置的阈值进行对比,作为识别翻译需求的依据。 [0101] Similarly S102, in practical applications, if the user input query is a word or phrase, may be words or phrases directly units, compared with the statistics; query if the user enters a sentence, it can first sentences word, then each of the segmentation result as a unit, compared with the statistics, in particular, in the current query multiple word may be appropriate statistical frequency summing each word, and with a preset threshold value in contrast, as the basis for identifying translation needs.

[0102] 同样,如果在S102中,对关键词出现的次数进行统计之前,先做了词形还原或去除停用词的操作,则在本步骤中,也应在将当前query与统计结果进行对比之前,执行相应的词形还原或去除停用词操作。 [0102] Similarly, if in the previous step S102, the number of times a keyword appears statistics, choosing to operate Lemmatization or removing stop words, at this step should be carried out with the current query statistics before contrast, perform a corresponding Lemmatization stop words or removal operation. [0103] 在本申请的另一个实施例中,在S102还可以利用n-gram模型,对所获取的关键词中出现的每个n-gram的频率进行统计。 [0103] In another embodiment of the present disclosure, it may also be utilized in S102 n-gram model, the frequency of each n-gram of keywords appearing in the acquired statistics.

[0104] N-Gram是大词汇连续识别中常用的一种语言模型,这种模型可以将具有I个单词的句子拆分为1-n+l个n-gram单元。 [0104] N-Gram model is a language commonly used in large vocabulary continuous recognition, the model can be an I-word sentences split into 1-n + l th n-gram units. 当η取I时,即相当于前面的基本分词操作。 When taking η I, i.e. substantially equivalent to the preceding word operations. 在实际应用中,可以根据SlOl中所得到的queir的平均长度确定η的具体取值,如果平均长度较长(如10以上),可以选择较大的η,如果平均长度较短,可以选择较小的η,一般情况下,N值取2,3,4效果较好。 In practical applications, according to the specific values ​​of the average length queir SlOl obtained in the [eta] is determined, if the average length of the longer (e.g., 10 or more), the larger [eta] may be selected, if the average length of the shorter, more may be selected small η, in general, N value takes 2,3,4 better.

[0105] 下面以η = 2为例,对本申请实施例进行说明。 [0105] In the following η = 2 as an example, embodiments of the present application will be described.

[0106] 假设在步骤S101,获得具有翻译需求的query集如下: [0107] Al)The server is temporarily unable to service your request due tomaintenance downtime or capacity problems. Please try again later. [0106] Suppose in step S101, the query set is obtained having the following translation requirements: [0107] Al) The server is temporarily unable to service your request due tomaintenance downtime or capacity problems Please try again later..

[0108] BI)This is a wrong number. Please check up and try again later. [0108] BI) This is a wrong number. Please check up and try again later.

[0109] S102a,首先对两个句子进行分词,并做词形还原处理,得到结果如下: [0109] S102a, first two sentences of the word, and do Lemmatization to give the following results:

[0110] A2)the server be temporar unable to service your request due tomaintenance downtime or capacity problem please try again Iat [0110] A2) the server be temporar unable to service your request due tomaintenance downtime or capacity problem please try again Iat

[0111] B2)this be a wrong number, please check up and try again Iat [0111] B2) this be a wrong number, please check up and try again Iat

[0112] S102b,然后对两个句子进行去停用词处理,得到结果如下: [0112] S102b, then two sentences to stop word to give the following results:

[0113] A3) server temporar unable service request due maintenancedowntimecapacity problem please try again Iat [0113] A3) server temporar unable service request due maintenancedowntimecapacity problem please try again Iat

[0114] B3)wrong number please check up try again Iat [0114] B3) wrong number please check up try again Iat

[0115] S102c,进行2-gram 频率统计: [0115] S102c, for 2-gram frequency statistics:

[0116] 在以上两个句子中,出现的所有2-gram单元列举如下: [0116] In the above two sentences, all appear 2-gram unit are listed below:

[0117] server temporar [0117] server temporar

[0118] temporar unable [0118] temporar unable

[0119] unable service [0119] unable service

[0120] service request [0120] service request

[0121] request due [0121] request due

[0122] due maintenance [0122] due maintenance

[0123] maintenance downtime [0123] maintenance downtime

[0124] downt ime capac i ty [0124] downt ime capac i ty

[0125] capacity problem [0125] capacity problem

[0126] problem please [0126] problem please

[0127] pi ease try [0127] pi ease try

[0128] try again [0128] try again

[0129] again Iat [0129] again Iat

[0130] wrong number[0131 ] number please [0130] wrong number [0131] number please

[0132] please check [0132] please check

[0133] check up[0134] up try [0133] check up [0134] up try

[0135] try again [0135] try again

[0136] again Iat [0137] 对以上2-gram进行频次统计,并以频次作为2-gram的分值,得到分值查询词典: [0136] again Iat [0137] for more than 2-gram statistics were frequency, and frequency as the score to 2-gram obtain scores dictionary query:

[0138] [0138]

Server temporar I Server temporar I

again Iat 2 capacity problem I again Iat 2 capacity problem I

check up I check up I

downtime capacity I downtime capacity I

due maintenance I maintenance downtime I due maintenance I maintenance downtime I

number please I number please I

please check I please check I

please try I please try I

problem please I problem please I

request due I request due I

service request I service request I

temporar unable I temporar unable I

try again 2 try again 2

unable service I unable service I

up try I up try I

wrong number I wrong number I

[0139]在 S103,假设用户新输入的query :“The page you are looking for [0139] In S103, assuming that the new query user input: "The page you are looking for

istemporarily unavailable. Please try again later·,, istemporarily unavailable. Please try again later · ,,

[0140] a)首先按照S102a和S102b的处理方法进行分词、词形还原、去停用词,得到: [0140] a) First segment words according to the processing method S102a and S102b, Lemmatization, to stop words, to obtain:

[0141] page look temporar unavailable please try again Iat [0141] page look temporar unavailable please try again Iat

[0142] 对于该句子,统计每个2-gram在分值词典中的值,并代入下面的公式求和: [0142] For the sentence, the score for each count value in the dictionary 2-gram, and substituted into the following equation summation:

l-n+l l-n + l

[0143] Score = f (Gi) [0143] Score = f (Gi)

i=l i = l

[0144] 其中,I是经词形还原、去停用词处理后的文本长度,此例中I = 8,Gi表示文本中 [0144] wherein, I is by Lemmatization, to disable the word length of the text after the treatment, in this case I = 8, Gi represents the text

中的第i个n-gram单元,f (Gi)是Gi在分值词典中的的分值,将分值代入上述公式,得到:[0145] [0146] The i-th n-gram unit, F (Gi) is the score value Gi of the dictionary, the score into the above equation to give: [0145] [0146]

Figure CN102982025AD00121

[0147] [0148] =0 + 0 + 0 + 0 + 1 + 2 + 2 [0147] [0148] = 0 + 0 + 0 + 0 + 1 + 2 + 2

[0149] =5 [0149] 5 =

[0150] 假设预设的阈值为3,而该query的Score = 5,则可以判定该query有翻译需求。 [0150] Suppose preset threshold is 3, and the query's Score = 5, it can be determined that the query has translation requirements.

[0151] 在本申请所提供的一种实施方式中,如果搜索引擎具有实时识别query并反应的功能,则根据上述方案确定搜索请求具有翻译需求后,可以直接在搜索页面对搜索请求对应的翻译结果进行展现,这样,用户就可以在不进入搜索结果页的情况下,得到所需的翻译结果。 After [0151] In one embodiment of the present embodiment provided herein, if the search engine functions query and real-time identification of the reaction, it is determined that the search request according to the above embodiment having a translation needs to be in the search page request corresponding to the search directly translated the results show, so that the user can enter without a search results page to give the desired translation.

[0152] 图2所示为本申请实施例所提供的一种翻译结果展现方式,在该方式中,翻译结果是在搜索框中进行展现。 [0152] FIG. 2 embodiment of the present application one kind of translation results provided show embodiments, in this embodiment, the translation result is displayed in the search box.

[0153] 图3所示为本申请实施例所提供的另一种翻译结果展现方式,在该方式中,翻译结果是以搜索建议的形式进行展现。 [0153] Figure 3 is shown another translation application examples provided show results embodiment, in this manner, the translation result is a search suggestion to unfold.

[0154] 在实际应用中,对于翻译结果的展现,可以使用不同字体、颜色等形式的文字,也可以使用链接、图片等其他媒体方式进行展现。 [0154] In practical applications, to show the translation results, you can use different fonts, colors and other forms of text, you can also use links, images and other media fashion show. 展现的内容不仅可以包括直接翻译结果(如词典释义,自动翻译结果等),也可以包括其他相关内容,例如词性,用法,常用搭配,使用环境,例句,首标,朗读功能等等。 Show content may include not only direct translations (such as dictionary definition, automatic translation, etc.), may also include other related content, such as part of speech, usage, often with the use environment, example sentences, headers, reading and much more.

[0155] 在本申请所提供的一种实施方式中,如果搜索引擎能针对用户当前输入实时生成搜索建议,则在系统资源允许的前提下,搜索引擎还可以进一步判断这些搜索建议是否具有翻译需求。 [0155] In one embodiment provided herein, if the search engines can generate real-time input for the user's current search suggestions, then the premise of the system resources allow, the search engine can further determine whether these search suggestions translation needs . 如果有,可以将搜索建议对应的翻译内容展现在搜索建议框中,如图4所示。 If so, you can search suggestions translate content corresponding to the show in search suggestions box, shown in Figure 4.

[0156] 相应于上面的方法实施例,本申请还提供一种搜索需求识别装置,参见图5所示,包括: [0156] corresponding to the above embodiment of the method embodiment, the present application also provides a search needs recognition apparatus, see FIG. 5, comprising:

[0157] 翻译关键词获取单元501,用于根据用户的历史行为日志,获取用户在执行翻译操作时所使用的关键词; [0157] Translation keyword obtaining unit 501, based on the history log for the behavior of the user, the user performs keyword acquiring translation operations used;

[0158] 本申请实施例方案,是基于用户的行为的历史数据,对用户的曾经明确进行过翻译操作的关键词进行统计,作为识别翻译需求的依据。 Example program [0158] present application is based on historical data of the user's behavior, the user had clearly been translated keyword operating statistics as the basis for identifying translation needs. 对于使用搜索引擎的每一名用户,系统都会记录用户的各种行为,并将这些行为记录在用户日志中。 For every user search engine, the system will record the user's various actions and these actions recorded in the user log. 根据用户常见的翻译操作,可以将翻译关键词获取单元501具体配置为以下几种方式: The common user operations translation, translation keyword obtaining unit 501 may be specifically configured in the following ways:

[0159] 1)用于在用户在搜索引擎所给出的搜索结果中,选择了可提供翻译服务的搜索结果的情况下,获取用户本次搜索所使用的关键词。 The [0159] 1) the search results for the user search engine given, the search results may be selected to provide the translation service, the user acquires keywords used in this search.

[0160]当用户在搜索引擎输入一段文字,搜索引擎返回相应的搜索结果,其中,有些搜索结果是可以提供翻译服务的,例如翻译类网站。 [0160] When a user enters a text search engine, the search engine returns the search results, where some of the search results can provide translation services, such as translation websites. 如果用户进一步点选了这一类翻译结果,则对用户在搜索框中输入的文字进行记录。 If you click on the further results of this type of translation, the text for the user in the search box recorded.

[0161] 例如用户在搜索引擎中输入了query :“patent”,然后用户在搜索结果页中点击了翻译类站点的链接(比如WWW.1ciba. com, diet, youdao. com等),此时可以认为用户输入的这个query是有翻译需求的,于是将该query :“patent”记录下来。 [0161] For example the user enters in a search engine query: "patent", then the user clicks on the link translation class sites (such as WWW.1ciba com, diet, youdao com, etc...) In the search results page, then you can this query entered by the user believes there is a demand for translation, then the query: "patent" recorded. 而如果用户输入query后没有点击翻译类网站,比如用户输入“iphone”,然后点击了一个购物网站,则认为该query没有翻译需求,不对该query进行记录。 And if the user does not click input query translation websites, such as user input "iphone", then click on a shopping site, it is considered that the query needs no translation, no record of the query.

[0162] 2)用于在根据用户输入的搜索请求,可以明确判断出本次搜索具有翻译需求的情况下,获取本次搜索具有翻译需求部分的关键词。 [0162] 2) In the case according to the search request input by the user, can be clearly judged that this search has translation needs, this search has acquired keyword translation needs portion.

[0163] 根据现有的翻译需求识别技术,当用户所输入的query中包含明显具有翻译需求的表述时,可以认为用户本次搜索具有翻译需求,此时将对query中有翻译需求的文字部分进行记录。 When [0163] The conventional translation needs recognition technology, when the query input by the user has clearly expressed contains translation needs to be considered that the user having this search translation needs, this case will have the text portion of the query translation needs recorded.

[0164] 例如,用户在搜索引擎中输入了query :“patent翻译”,搜索引擎可以根据“翻译”这个明显具有翻译需求的表述确定用户本次搜索具有翻译需求,于是将query中明显具有翻译需求的表述部分去除,仅对剩下的部分“patent”进行记录。 [0164] For example, the user inputs a query in a search engine: "patent Translation", the search engine can "translate" This clearly has expressed translation needs to determine the user that the search for a translation needs, then the query is evident with translation requirements the presentation is partially removed, and only the remaining part of the "patent" is recorded.

[0165] 再比如,用户在搜索引擎中输入了query :“patent是什么意思”,搜索引擎可以根据“是什么意思”这个明显具有翻译需求的表述确定用户本次搜索具有翻译需求,于是将query中的“是什么意思”去除,仅对剩下的部分“patent ”进行记录。 [0165] As another example, the user enters in the search engine the query: "patent What do you mean," the search engine can determine a user that the search for a translation needs according to "mean" this clearly has expressed translation needs, so the query the "mean" to remove, only the remaining portion of the "patent" is recorded.

[0166] 3)用于获取用户在翻译类产品中所输入的关键词。 [0166] 3) for obtaining a user keyword translation products as entered.

[0167] 除了从搜索引擎获取用户进行翻译操作时所使用的关键词,还可以从其他的翻译类产品中,获取用户进行翻译操作时所使用的关键词。 [0167] In addition to keyword search engine operation when translated from users get used, but also from other translation products, gets the user to key words used in the translation operation. 例如,对于百度系统而言,除了提供基本的搜索引擎之外,同时还提供其他直接翻译服务的产品,如百度翻译(fany1. baidu.com)、百度词典(diet, baidu. com)等等,而用户在这些产品中输入的文字显然是具有翻译需求的。 For example, for Baidu systems, in addition to providing other than basic search engine, while also providing other direct translation service products such as Baidu translation (fany1. Baidu.com), Baidu dictionary (diet, baidu. Com) and so on, the text entered by the user in these products clearly have translation needs. 因此,只要能够通过某种途径,获得用户在其他翻译类产品中输入的内容,就可以将这些内容记录下来,作为后续搜索引擎识别翻译需求的依据。 Therefore, as long as in some way, to get user input in other translation products, the content can be recorded as a follow-up search engines identify translation needs basis.

[0168] 翻译关键词统计单元502,用于对所获取的关键词的出现频率进行统计; [0168] Translation keyword statistics unit 502, the frequency of occurrence for the acquired keyword statistics;

[0169] 在实际应用中,如果用户输入query的是单词或者短语,可以直接以单词或短语为单位,记录同样形式的单词或短语的出现次数。 [0169] In practice, if the user input is a query word or phrase, a word or phrase may be a direct unit, recording the number of occurrences of the same word or phrase in the form. 如果用户输入的query是句子,则可以先对句子进行分词,然后以每个分词结果为单位,统计出现的次数。 If a user enters a query sentence, you can be the first word of a sentence, and then the result of each word as a unit, the number of statistics appear. 当然,在实际应用中,除了出现次数,也可以用出现次数与总次数的比值或tf-1df值等其他形式来表示关键词的出现频率,本申请实施例对此并不需要进行限定。 Of course, in practical applications, in addition to the number of occurrences, can also occur in the form of a ratio or other value tf-1df frequency and the like and represented the total number of occurrence frequencies of keywords, the application of the present embodiment does not need to be defined.

[0170] 翻译需求识别单元503,用于接收到搜索请求后,根据统计结果判断该搜索请求中搜索关键词的出现频率是否超过预设的阈值,如果是,则确定该搜索请求具有翻译需求。 [0170] translation needs recognition unit 503, after receiving the search request, the search request is determined whether the search keyword appearance frequency exceeds a preset threshold according to statistics, if so, determining that the search request has translation requirements.

[0171] 对于阈值的设置方法,可以根据经验直接设定,也可以按照前述方法选择一批含有翻译需求的query,并同时选择一批不含有翻译需求的query,两者的数量相近为宜。 [0171] The method for setting a threshold value can be directly set according to experience, can select a number of query translation needs containing as previously described, and also select a number of query translation needs not to contain, two are preferably similar. 然后分别进行打分,选择一个使两类数据能够明显区分的数值作为阈值。 And then were scored, two types can be selected so that a clear distinction is numerical data as the threshold value.

[0172] 最简单的一种方式,是判断当前输入的关键词是否存在于具有翻译需求关键词中,如果是,则确定当前搜索请求具有翻译需求,这种方式相当于将阈值设定为O。 [0172] The simplest way is to determine whether the current input keyword exists in keywords having a translation requirements, and if so, determining the current search request having a translation needs, this embodiment corresponds to the threshold is set to O . 也可以将阈值设定为大于O的数值,也就是说,只有当前输入的关键词在统计结果中出现超过一定次数,才认为当前搜索请求具有翻译需求。 The threshold may also be set to a value greater than O, that is, more than a certain number of keywords the current input only appear in the statistics, only that the current search request with translation needs. 当然,本领域技术人员可以理解,根据实际需求,也可以设置多个不同的阈值范围,从而确定当前搜索请求的翻译需求强度。 Of course, those skilled in the art will appreciate, according to the actual demand, may also be provided a plurality of different threshold range to determine the translation requirements of the intensity of the current search request. 对于具有不同翻译需求强度的搜索请求,可以给予不同的处理方式,例如,对于具有更强翻译需求强度的搜索请求,可以将翻译结果排在搜索结果中更为靠前的位置。 For a search request having a different translation needs strength, different approaches may be administered, for example, a search request for a translation needs a stronger intensity, the translation may be ranked search results in a more forward position.

[0173] 参见图6所示,在本申请的一种实施方式中,上述装置还可以包括:翻译关键词预处理单元504和搜索关键词预处理单元505 :[0174] 翻译关键词预处理单元504,用于在所述翻译关键词统计单元对所获取的关键词的出现频率进行统计之前,对所获取的关键词进行词形还原处理和/或去除停用词处理。 [0173] Referring to FIG 6, in one embodiment of the present disclosure, the apparatus may further comprising: a pre-processing unit 504 and translated keyword search keyword preprocessing unit 505: [0174] Image pre-processing unit Translation 504 for translation in frequency of occurrence of the keyword of keyword statistics unit acquired before the statistics on the acquired keyword Lemmatization be processed and / or treated to remove stop words.

[0175] 搜索关键词预处理单元505,用于在所述翻译关键词统计单元判断搜索请求中搜索关键词的出现频率是否超过预设的阈值之前,对搜索请求中的搜索关键词进行词形还原处理和/或去除停用词处理。 Before [0175] Image pre-processing unit 505 searches for the translation in the keyword search request statistics unit determines whether or not a search keyword occurrence frequency exceeds a predetermined threshold value, a search request for a search keyword conjugations reduction treatment and / or removal of stop word processing.

[0176] 在本申请的一种实施方式中, [0176] In an embodiment of the present application, the

[0177] 所述翻译关键词统计单元502,可以具体配置为: [0177] The translated keyword statistics unit 502 may be specifically configured to:

[0178] 用于利用n-gram模型,对所获取的关键词中出现的每个n-gram单元的频率进行统计。 [0178] using n-gram model is used, each n-gram frequency element keywords appearing in the acquired statistics.

[0179] 所述翻译需求识别单元503,具体配置为: [0179] The translation needs recognition unit 503, specifically configured to:

[0180] 用于根据统计结果,获得搜索关键词中每个n-gram单元的频率; [0180] According to statistics, the frequency obtained in each n-gram keyword search unit;

[0181] 判断搜索关键词中每个n-gram单元的频率值之和是否超过预设的阈值。 [0181] Analyzing a keyword search unit for each n-gram frequency value and exceeds a preset threshold value.

[0182] 参见图7所示,在本申请的一种实施方式中,上述装置还可以包括: [0182] Referring to FIG. 7, in one embodiment of the present disclosure, the apparatus may further comprise:

[0183] 翻译结果展现单元506,用于在所述翻译需求识别单元确定搜索请求具有翻译需求后,对搜索请求对应的翻译结果进行展现。 [0183] The results show the translation unit 506 for translating said translation results in a search request identifying unit determines the demand with translation requirements, for the search request corresponding to unfold.

[0184] 如果搜索引擎具有实时识别query并反应的功能,则根据确定搜索请求具有翻译需求后,翻译结果展现单元506可以直接在搜索页面对搜索请求对应的翻译结果进行展现,这样,用户就可以在不进入搜索结果页的情况下,得到所需的翻译结果。 [0184] function if the search engine with real-time identification query and response, it is according to the determined search request with posttranslational demand, the translation results show the translation unit 506 may search page of the search request corresponds directly to unfold, so that the user can without entering the search results page to give the desired translation.

[0185] 所述翻译结果展现单元具体可以配置为: [0185] The translation result presentation unit may be specifically configured to:

[0186] 用于在搜索框中,展现搜索请求所对应的翻译结果;展现结果如图2所示。 [0186] In searching for the box, show the translation result corresponding to the search request; 2 show the results are shown in FIG.

[0187] 所述翻译结果展现单元还可以配置为: [0187] The translation result presentation unit may be further configured to:

[0188] 将搜索请求所对应的翻译结果以搜索建议的形式进行展现;展现结果如图3所 [0188] The translation result corresponding to the search request to search for the form of presentation is proposed; in FIG. 3 show the results of

/Jn ο / Jn ο

[0189] 在实际应用中,对于翻译结果的展现,可以使用不同字体、颜色等形式的文字,也可以使用链接、图片等其他媒体方式进行展现。 [0189] In practical applications, to show the translation results, you can use different fonts, colors and other forms of text, you can also use links, images and other media fashion show. 展现的内容不仅可以包括直接翻译结果(如词典释义,自动翻译结果等),也可以包括其他相关内容,例如词性,用法,常用搭配,使用环境,例句,首标,朗读功能等等。 Show content may include not only direct translations (such as dictionary definition, automatic translation, etc.), may also include other related content, such as part of speech, usage, often with the use environment, example sentences, headers, reading and much more.

[0190] 此外,在在本申请的另一种实施方式中,翻译需求识别单元501还可以用于在搜索引擎接收到搜索请求并生成搜索建议之后,判断搜索建议的内容是否具有翻译需求。 [0190] Further, in another embodiment, after application of the present embodiment, the translation unit 501 may also identify the need for a search engine receives the search request and search suggestions generated, it is determined whether the content search suggestions translation requirements. 如果识别出有翻译需求,则翻译结果展现单元507可以将搜索建议对应的翻译内容展现在搜索建议框中,如图4所示。 If there is translation needs identified, the translation result presentation unit 507 may search suggestions translations corresponding to show search suggestions box, as shown in FIG.

[0191] 为了描述的方便,描述以上装置时以功能分为各种单元分别描述。 [0191] For convenience of description, the description of the above device is divided into various functional units are described. 当然,在实施本申请时可以把各单元的功能在同一个或多个软件和/或硬件中实现。 Of course, the function of each unit is implemented in one or more software and / or hardware at the time of application of the present embodiment.

[0192] 通过以上的实施方式的描述可知,本领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件平台的方式来实现。 [0192] By the above described embodiments can be seen, those skilled in the art can understand that the present application may be implemented by software plus a necessary universal hardware platform. 基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在存储介质中,如R0M/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例或者实施例的某些部分所述的方法。 Based on such understanding, the technical solutions of the present application or the nature of the part contributing to the prior art may be embodied in a software product, which computer software product may be stored in a storage medium, such as a R0M / RAM, magnetic disk, , an optical disc, and includes several instructions that enable a computer device (may be a personal computer, a server, or network device) method for each application of the present embodiment or embodiments certain portions of the described embodiment is performed. [0193] 本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。 [0193] In the present specification, various embodiments are described in a progressive manner, similar portions of the same between the various embodiments refer to each other, are different from the embodiment and the other embodiments described each embodiment focus. 尤其,对于装置或系统实施例而言,由于其基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。 In particular, for the apparatus or system embodiments, since it is substantially similar to the method embodiments, the description is relatively simple, some embodiments of the methods see relevant point can be described. 以上所描述的装置及系统实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。 Device and system embodiments described above are merely exemplary embodiments, wherein said unit is described as separate components may be or may not be physically separate, parts displayed as units may be or may not be physical units, i.e. It may be located in one place, or may be distributed to multiple network units. 可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。 You can select some or all of the modules according to actual needs to achieve the object of the solutions of the embodiments. 本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。 Those of ordinary skill in the art without creative efforts, can be understood and implemented.

[0194] 本申请可用于众多通用或专用的计算系统环境或配置中。 [0194] The present application can be used in numerous general purpose or special purpose computing system environments or configurations. 例如:个人计算机、服务器计算机、手持设备或便携式设备、平板型设备、多处理器系统、基于微处理器的系统、置顶盒、可编程的消费电子设备、网络PC、小型计算机、大型计算机、包括以上任何系统或设备的分布式计算环境等等。 For example: personal computers, server computers, handheld or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer electronics, network PC, minicomputers, mainframe computers, comprising any of the above systems or devices, the distributed computing environment.

[0195] 本申请可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如程序模块。 [0195] The present application may be described in the general context of computer-executable instructions, executed by a computer, such as program modules. 一般地,程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。 Generally, program modules include performing particular tasks or implement particular abstract data types routines, programs, objects, components, data structures, and the like. 也可以在分布式计算环境中实践本申请,在这些分布式计算环境中,由通过通信网络而被连接的远程处理设备来执行任务。 This application may be practiced in a distributed computing environment, the distributed computing environments, where tasks are performed by remote processing devices that are linked through a communications network. 在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。 In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices in.

[0196] 以上所述仅是本申请的具体实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本申请的保护范围。 [0196] The above are only specific embodiments of the present disclosure, it should be noted that those of ordinary skill in the art, in the present application without departing from the principles of the premise, can make various improvements and modifications, such modifications and modifications should be considered within the scope of the present application.

Claims (20)

1. 一种搜索需求识别方法,其特征在于,包括: 根据用户的历史行为日志,获取用户在执行翻译操作时所使用的关键词; 对所获取的关键词的出现频率进行统计; 接收到搜索请求后,根据统计结果判断该搜索请求中搜索关键词的出现频率是否超过预设的阈值,如果是,则确定该搜索请求具有翻译需求。 A search needs recognition method, characterized by comprising: the user behavior history log for keywords the user performs translation operations used; occurrence frequency of the keyword of the acquired statistics; received search after the request, determine whether the search request keyword search frequency of occurrence exceeds a preset threshold value based on statistical results, and if so, determine the search request has translation needs.
2.根据权利要求I所述的方法,其特征在于,所述获取用户在执行翻译操作时所使用的关键词,包括: 如果用户在搜索引擎所给出的搜索结果中,选择了可提供翻译服务的搜索结果,则获取用户本次搜索所使用的关键词。 2. The method as claimed in claim I, wherein the obtaining the user performs keyword used in the translation operation, comprising: if the user search engine results given in translation may be selected search results service, get this search keywords users are using.
3.根据权利要求I所述的方法,其特征在于,所述获取用户在执行翻译操作时所使用的关键词,包括: 如果根据用户输入的搜索请求,可以明确判断出本次搜索具有翻译需求,则获取本次搜索具有翻译需求部分的关键词。 3. The method as claimed in claim I, wherein the obtaining the user performs keyword used in the translation operation, comprising: a translation needs according to the search request if the user input can be clearly judged that this search has , then get this search with a keyword translation requirements section.
4.根据权利要求I所述的方法,其特征在于,所述获取用户在执行翻译操作时所使用的关键词,包括: 获取用户在翻译类产品中所输入的关键词。 4. The method as claimed in claim I, wherein the obtaining the user performs keyword used by the translation operation, comprising: obtaining a user keyword translation products as entered.
5.根据权利要求I所述的方法,其特征在于,所述对所获取的关键词的出现频率进行统计,包括: 利用n-gram模型,对所获取的关键词中出现的每个n-gram单元的频率进行统计。 The method according to claim I, wherein said statistical frequency of occurrence of the acquired keyword, comprising: using n-gram model, each keyword of the acquired n- appearing It means gram frequency statistics.
6.根据权利要求I所述的方法,其特征在于,所述接收到搜索请求后,根据统计结果判断该搜索请求中搜索关键词的出现频率是否超过预设的阈值,包括: 根据统计结果,获得搜索关键词中每个n-gram单元的频率; 判断搜索关键词中每个n-gram单元的频率值之和是否超过预设的阈值。 6. The method according to claim I, characterized in that, after the search request is received, determines whether the search keyword search request occurrence frequency exceeds a preset threshold value according to statistical results, comprising: According to statistics, obtained keyword search frequency for each n-gram unit; Analyzing search keywords for each n-gram frequency element and the value exceeds a preset threshold value.
7.根据权利要求1-6任一项所述的方法,其特征在于,在对所获取的关键词的出现频率进行统计之前,还包括: 对所获取的关键词进行词形还原处理和/或去除停用词处理。 The method according to any one of the claims 1-6, characterized in that, prior to the occurrence frequency of the keyword of the acquired statistics, further comprising: a keyword to be acquired and processed Lemmatization / or stop word removal process.
8.根据权利要求7所述的方法,其特征在于,在判断搜索请求中搜索关键词的出现频率是否超过预设的阈值之前,还包括: 对搜索请求中的搜索关键词进行词形还原处理和/或去除停用词处理。 8. The method according to claim 7, characterized in that, prior to keyword searches whether the appearance frequency exceeds a preset threshold value is determined in the search request, further comprising: search keyword search request for processing Lemmatization and / or stop word removal process.
9.根据权利要求1-6任一项所述的方法,其特征在于,在确定搜索请求具有翻译需求后,还包括对搜索请求对应的翻译结果进行展现,所述翻译结果的展现方法包括: 在搜索框中,展现搜索请求所对应的翻译结果;或将搜索请求所对应的翻译结果以搜索建议的形式进行展现。 9. A method according to any one of claims 1-6, characterized in that, after determining that a search request has translation needs, further comprising a translation result corresponding to the search request to unfold, the translation result presentation method comprising: in the search box, show the translation result corresponding to the search request; or the translation result corresponding to the search request to search for the form of presentation is recommended.
10.根据权利要求1-6任一项所述的方法,其特征在于,在接收到搜索请求并生成搜索建议之后,还包括: 判断搜索建议的内容是否具有翻译需求。 10. The method according to any one of claims 1-6, wherein, after receiving the search request and generates a search suggestions, further comprising: determining whether the content search suggestions translation requirements.
11. 一种搜索需求识别装置,其特征在于,包括: 翻译关键词获取单元,用于根据用户的历史行为日志,获取用户在执行翻译操作时所使用的关键词;翻译关键词统计单元,用于对所获取的关键词的出现频率进行统计; 翻译需求识别单元,用于接收到搜索请求后,根据统计结果判断该搜索请求中搜索关键词的出现频率是否超过预设的阈值,如果是,则确定该搜索请求具有翻译需求。 A search needs recognition apparatus comprising: translation keyword obtaining means for logging the behavior of a user based on the history, the user performs keyword acquiring translation operations used; keyword translation statistics unit, with in the frequency of occurrence of the keyword of the acquired statistics; translation needs recognition unit, after receiving the search request, according to the statistics of the search request determines whether the search keyword occurrence frequency exceeds a preset threshold, if yes, it is determined that the search request has translation needs.
12.根据权利要求11所述的装置,其特征在于,所述翻译关键词获取单元,具体配置为: 用于在用户在搜索引擎所给出的搜索结果中,选择了可提供翻译服务的搜索结果的情况下,获取用户本次搜索所使用的关键词。 12. The apparatus according to claim 11, wherein said translation keyword obtaining unit is specifically configured to: search for the user in the search engine results given in the selected search offers translation services the results of the case, get this search keywords users are using.
13.根据权利要求11所述的装置,其特征在于,所述翻译关键词获取单元,具体配置为: 用于在根据用户输入的搜索请求,可以明确判断出本次搜索具有翻译需求的情况下,获取本次搜索具有翻译需求部分的关键词。 13. The apparatus according to claim 11, wherein said translation keyword obtaining unit is specifically configured to: in a case where a search request according to a user input, can be clearly judged that this search has translation needs to get this search with a keyword translation requirements section.
14.根据权利要求11所述的装置,其特征在于,所述翻译关键词获取单元,具体配置为: 用于获取用户在翻译类产品中所输入的关键词。 14. The apparatus according to claim 11, wherein said translation keyword obtaining unit is specifically configured to: obtain keywords for a user in the translation products in the input.
15.根据权利要求11所述的装置,其特征在于,所述翻译关键词统计单元,具体配置为: 用于利用n-gram模型,对所获取的关键词中出现的每个n-gram单元的频率进行统计。 15. The apparatus according to claim 11, wherein said statistical keyword translation unit is specifically configured to: utilizing n-gram models, each n-gram unit keywords appearing in the acquired the frequency statistics.
16.根据权利要求11所述的装置,其特征在于,所述翻译需求识别单元,具体配置为: 用于根据统计结果,获得搜索关键词中每个n-gram单元的频率; 判断搜索关键词中每个n-gram单元的频率值之和是否超过预设的阈值。 16. The apparatus according to claim 11, wherein said translation needs identification unit is specifically configured to: according to statistics, frequency of each search keyword obtained in n-gram unit; Analyzing search keywords each n-gram frequency element and the value exceeds a preset threshold value.
17.根据权利要求11-16任一项所述的装置,其特征在于,该装置还包括: 翻译关键词预处理单元,用于在所述翻译关键词统计单元对所获取的关键词的出现频率进行统计之前,对所获取的关键词进行词形还原处理和/或去除停用词处理。 17. The apparatus according to any of claims 11-16, wherein the apparatus further comprising: a preprocessing unit translated keyword, keyword for translation in the occurrence statistics unit to the acquired keyword before the frequency statistics for keywords were acquired Lemmatization treatment and / or removal of stop word processing.
18.根据权利要求17所述的装置,其特征在于,该装置还包括: 搜索关键词预处理单元,用于在所述翻译关键词统计单元判断搜索请求中搜索关键词的出现频率是否超过预设的阈值之前,对搜索请求中的搜索关键词进行词形还原处理和/或去除停用词处理。 18. The apparatus according to claim 17, characterized in that, the apparatus further comprising: a preprocessing unit keyword search, keyword for translation in the statistics unit determines whether or not a search keyword search request occurrence frequency exceeds a predetermined before the set threshold value, a search request for a search keyword Lemmatization treatment and / or treated to remove stop words.
19.根据权利要求11-16任一项所述的装置,其特征在于,还包括: 翻译结果展现单元,用于在所述翻译需求识别单元确定搜索请求具有翻译需求后,对搜索请求对应的翻译结果进行展现,所述翻译结果展现单元具体配置为: 用于在搜索框中,展现搜索请求所对应的翻译结果;或将搜索请求所对应的翻译结果以搜索建议的形式进行展现。 19. The apparatus according to any one of claims 11 to 16, characterized in that, further comprising: a translation result presentation unit, after the translation requirements for identification unit determines that the search request has translation needs, corresponding to the search request translation results show, the translation result presentation unit is specifically configured to: a search box, show the translation result corresponding to the search request; or the translation result corresponding to the search request to search for the form of presentation is recommended.
20.根据权利要求11-16任一项所述的装置,其特征在于,所述翻译需求识别单元还用于在接收到搜索请求并生成搜索建议之后,判断搜索建议的内容是否具有翻译需求。 20. The apparatus according to any of claims 11-16, wherein said translation unit is further configured to identify the need for after receiving the search request and generates a search suggestions, determining whether the content search suggestions translation requirements.
CN201110258835.3A 2011-09-02 2011-09-02 A method and apparatus for identifying search needs CN102982025B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110258835.3A CN102982025B (en) 2011-09-02 2011-09-02 A method and apparatus for identifying search needs

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110258835.3A CN102982025B (en) 2011-09-02 2011-09-02 A method and apparatus for identifying search needs

Publications (2)

Publication Number Publication Date
CN102982025A true CN102982025A (en) 2013-03-20
CN102982025B CN102982025B (en) 2016-05-11

Family

ID=47856064

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110258835.3A CN102982025B (en) 2011-09-02 2011-09-02 A method and apparatus for identifying search needs

Country Status (1)

Country Link
CN (1) CN102982025B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103714054A (en) * 2013-12-30 2014-04-09 北京百度网讯科技有限公司 Translation method and translation device
CN103793364A (en) * 2014-01-23 2014-05-14 北京百度网讯科技有限公司 Method and device for conducting automatic phonetic notation processing and display on text
CN105677927A (en) * 2016-03-31 2016-06-15 百度在线网络技术(北京)有限公司 Method and device for providing searching result
CN105956038A (en) * 2016-04-26 2016-09-21 宇龙计算机通信科技(深圳)有限公司 Notification message management method and apparatus as well as terminal

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060064411A1 (en) * 2004-09-22 2006-03-23 William Gross Search engine using user intent
CN1761972A (en) * 2003-03-18 2006-04-19 Nhn株式会社 A method of determining an intention of internet user, and a method of advertising via internet by using the determining method and a system thereof
US20090043749A1 (en) * 2007-08-06 2009-02-12 Garg Priyank S Extracting query intent from query logs
US20090307198A1 (en) * 2008-06-10 2009-12-10 Yahoo! Inc. Identifying regional sensitive queries in web search
US20110035397A1 (en) * 2006-12-20 2011-02-10 Yahoo! Inc. Discovering query intent from search queries and concept networks
CN102012900A (en) * 2009-09-04 2011-04-13 阿里巴巴集团控股有限公司 An information retrieval method and system
CN102096717A (en) * 2011-02-15 2011-06-15 百度在线网络技术(北京)有限公司 Search method and search engine

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1761972A (en) * 2003-03-18 2006-04-19 Nhn株式会社 A method of determining an intention of internet user, and a method of advertising via internet by using the determining method and a system thereof
US20060064411A1 (en) * 2004-09-22 2006-03-23 William Gross Search engine using user intent
US20110035397A1 (en) * 2006-12-20 2011-02-10 Yahoo! Inc. Discovering query intent from search queries and concept networks
US20090043749A1 (en) * 2007-08-06 2009-02-12 Garg Priyank S Extracting query intent from query logs
US20090307198A1 (en) * 2008-06-10 2009-12-10 Yahoo! Inc. Identifying regional sensitive queries in web search
CN102012900A (en) * 2009-09-04 2011-04-13 阿里巴巴集团控股有限公司 An information retrieval method and system
CN102096717A (en) * 2011-02-15 2011-06-15 百度在线网络技术(北京)有限公司 Search method and search engine

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103714054A (en) * 2013-12-30 2014-04-09 北京百度网讯科技有限公司 Translation method and translation device
CN103714054B (en) * 2013-12-30 2017-03-15 北京百度网讯科技有限公司 Translating methods and apparatus
CN103793364A (en) * 2014-01-23 2014-05-14 北京百度网讯科技有限公司 Method and device for conducting automatic phonetic notation processing and display on text
CN103793364B (en) * 2014-01-23 2018-09-07 北京百度网讯科技有限公司 Method and apparatus for automatic text processing and display of phonetic
CN105677927A (en) * 2016-03-31 2016-06-15 百度在线网络技术(北京)有限公司 Method and device for providing searching result
CN105677927B (en) * 2016-03-31 2019-04-12 百度在线网络技术(北京)有限公司 For providing the method and apparatus of search result
CN105956038A (en) * 2016-04-26 2016-09-21 宇龙计算机通信科技(深圳)有限公司 Notification message management method and apparatus as well as terminal
WO2017185463A1 (en) * 2016-04-26 2017-11-02 宇龙计算机通信科技(深圳)有限公司 Management method and management device for notification message, and terminal

Also Published As

Publication number Publication date
CN102982025B (en) 2016-05-11

Similar Documents

Publication Publication Date Title
US8463593B2 (en) Natural language hypernym weighting for word sense disambiguation
Wang et al. Using Wikipedia knowledge to improve text classification
JP4664076B2 (en) Flashing of notes call out to highlight the inter-language search results
US8073877B2 (en) Scalable semi-structured named entity detection
EP1400901A2 (en) Method and system for retrieving confirming sentences
US8280882B2 (en) Automatic expert identification, ranking and literature search based on authorship in large document collections
US20110225155A1 (en) System and method for guiding entity-based searching
US20140006012A1 (en) Learning-Based Processing of Natural Language Questions
CA2536265C (en) System and method for processing a query
JP4726528B2 (en) Related term suggestion for multi-sense query
US10162885B2 (en) Automated self-service user support based on ontology analysis
US20060122997A1 (en) System and method for text searching using weighted keywords
Liu et al. Opinion target extraction using word-based translation model
JP5169816B2 (en) Question Answer apparatus, question-and-answer method, and a question-and-answer program
US8041697B2 (en) Semi-automatic example-based induction of semantic translation rules to support natural language search
WO2010107327A1 (en) Natural language processing method and system
JP2005535007A (en) The method of synthesis self-learning system for knowledge extraction for document retrieval system
JP2013502643A (en) Structured data translation device, system and method
CA2701171A1 (en) System and method for processing a query with a user feedback
CN1536483A (en) Method for extracting and processing network information and its system
CN101894102A (en) Method and device for analyzing emotion tendentiousness of subjective text
US8204874B2 (en) Abbreviation handling in web search
CN101042692B (en) translation obtaining method and apparatus based on semantic forecast
CN103902652A (en) Automatic question-answering system
CN102549572A (en) Systems and methods for providing advanced search result page content

Legal Events

Date Code Title Description
C06 Publication
C10 Entry into substantive examination
C14 Grant of patent or utility model