CN103838754B - Information search apparatus and method - Google Patents

Information search apparatus and method Download PDF

Info

Publication number
CN103838754B
CN103838754B CN201210482905.8A CN201210482905A CN103838754B CN 103838754 B CN103838754 B CN 103838754B CN 201210482905 A CN201210482905 A CN 201210482905A CN 103838754 B CN103838754 B CN 103838754B
Authority
CN
China
Prior art keywords
search
information
according
module
order
Prior art date
Application number
CN201210482905.8A
Other languages
Chinese (zh)
Other versions
CN103838754A (en
Inventor
徐鹏程
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to CN201210482905.8A priority Critical patent/CN103838754B/en
Publication of CN103838754A publication Critical patent/CN103838754A/en
Application granted granted Critical
Publication of CN103838754B publication Critical patent/CN103838754B/en

Links

Abstract

本发明公开了一种信息搜索方法,所述方法包括以下步骤:根据搜索词从类别库中查找相关的类别;分析所述搜索词在所述类别中的权重并生成权重信息;获取搜索统计信息,所述搜索统计信息是基于对用户搜索的历史进行统计得出;根据所述权重信息生成第一搜索顺序,所述第一搜索顺序为针对所述搜索词在所述类别库中不同类别中进行搜索所产生的搜索顺序;根据所述搜索统计信息对第一搜索顺序进行调整以产生第二搜索顺序;根据所述第二搜索顺序和所述搜索词进行搜索。 The present invention discloses an information search, the method comprising the steps of: find related category from the category database according to a search word; analyzing the search term weights in the category weights and generates weight information; search stats the statistics are based on search history of the user's search statistics derived; the weight information is generated based on the first search order, the search order is the first for the search term in different categories in the category library searching the generated sequential search; search statistics for the first search order to produce a second adjusted according to the search order; search according to the search order and the second search term. 本发明还公开了一种信息搜索装置。 The present invention also discloses an information searching apparatus. 本发明能向用户提供更加精确和更有关联性的搜索结果,使得搜索结果更加符合用户的需要和偏好。 The present invention can provide more accurate and more relevant search results to the user, so that the search results more in line with the user's needs and preferences.

Description

信息搜索装置及方法【技术领域】 Information search apparatus and method FIELD

[0001]本发明涉及信息搜索领域,特别涉及一种信息搜索装置及方法。 [0001] The present invention relates to the field of information search, particularly, to an apparatus and method for information search. 【背景技术】 【Background technique】

[0002]传统的网页、应用软件等向用户提供了搜索信息的功能,其搜索信息的技术方案往往是直接根据搜索词来进行搜索。 [0002] the traditional web, application software provides the ability to search for information to users, information technology solutions that search is often directly to search based on search terms.

[0003]在实践中,发明人发现现有的技术方案具有以下缺陷: [0003] In practice, the inventors find that the prior art solution has the following drawbacks:

[0004]针对搜索词所进行的搜索灵活性较低并且精确度差,原因是上述针对搜索词所进行的搜索得出的结果在广度和深度都十分有限,这里所讲的广度是指与搜索词相关的搜索结果所涉及的范围的大小,而深度则是指搜索结果与搜索词的匹配度,因此,用户无法获得其想要的搜索结果,或者无法获得与搜索词在其他方面具有关联性的搜索结果。 [0004] low search flexibility for the search term being made and the accuracy is poor, because the search for the above search terms conducted the results are very limited in breadth and depth, breadth talking about here refers to search the size of the word relevant search results related to the scope and depth refers to the degree of matching search results and search terms, so that the user can not get results they want, or can not obtain the search term have relevance in other areas search results.

[0005] 互联网中含有大量的信息,信息与信息之间具有各种各样的关联,上述技术方案对信息之间的关联的利用程度较低,因此所能提供的搜索结果十分有限。 [0005] The Internet contains a lot of information associated with a variety of information between the information and, to a lesser extent by using a relation between the aspect information, the search results can provide very limited.

[0006] 故,有必要提出一种新的技术方案,以解决上述技术问题。 [0006] Therefore, it is necessary to provide a new technical solution to solve the above problems. 【发明内容】 [SUMMARY]

[0007] 本发明的一个目的在于提供一种信息搜索方法,其能向用户提供更加精确的搜索结果和在多方面具有关联性的搜索结果。 [0007] An object of the present invention to provide an information search method, which can provide more accurate search results and search results associated with many aspects of the user.

[0008] 为解决上述问题,本发明提供了一种信息搜索方法,所述方法包括以下步骤:根据搜索词从类别库中查找相关的类别;分析所述搜索词在所述类别中的权重并生成权重信息;获取搜索统计信息,所述搜索统计信息是基于对用户搜索的历史进行统计得出;根据所述权重信息生成第一搜索顺序,所述第一搜索顺序为针对所述搜索词在所述类别库中不同类别中进行搜索所产生的搜索顺序;根据所述搜索统计信息对第一搜索顺序进行调整以产生第二搜索顺序;根据所述第二搜索顺序和所述搜索词进行搜索。 [0008] In order to solve the above problems, the present invention provides an information search, the method comprising the steps of: find related category from the category database according to a search word; analyzing the search term weights in the category and the weight generates weight information; obtaining search statistics, the search statistics are based on historical user search statistics obtained; generating a first search order based on the weight information right the first search in order to search for the word the class libraries in search of different categories produced sequential search; search statistics for the first search order to produce a second adjusted according to the search order; search according to the search order and the second search word .

[0009] 本发明的另一个目的在于提供一种信息搜索装置,其能向用户提供更加精确的搜索结果和在多方面具有关联性的搜索结果。 [0009] Another object of the present invention is to provide an information searching apparatus, which can provide more accurate search results to the user in many ways and have relevance search results.

[0010] 为解决上述问题,本发明提供了一种信息搜索装置,所述装置包括:预处理模块, 用于根据搜索词从类别库中查找相关的类别;权重信息生成模块,用于分析所述搜索词在所述类别中的权重并生成权重信息;搜索统计信息获取模块,用于获取搜索统计信息,所述搜索统计信息是基于对用户搜索的历史进行统计得出;顺序调整模块,用于根据所述权重信息生成第一搜索顺序,所述第一搜索顺序为针对所述搜索词在所述类别库中不同类别中进行搜索所产生的搜索顺序,以及用于根据所述搜索统计信息对第一搜索顺序进行调整以产生第二搜索顺序;搜索模块,用于根据所述第二搜索顺序和所述搜索词进行搜索。 [0010] In order to solve the above problems, the present invention provides an information search apparatus, said apparatus comprising: a preprocessing module configured to search a relevant category from the category database according to a search word; weight information generation means for analyzing said search term weights in the category weights and generates weight information; search statistics acquisition module, configured to obtain statistical information search, the search is based on statistical information on the search history of the user's statistics derived; order adjustment module, with to the weight information generator according to a first search order, for the first search order is the search order in the search generated class libraries in different categories for the search term, and search for, according to the statistical information first search order to be adjusted to generate a second search order; searching module configured to search according to the search order and the second search term.

[0011] 相对现有技术,本发明通过调整在各类别中搜索的顺序,可以优先地实现在与搜索词具有更高相关性的类别上进行搜索,实质上,对搜索顺序进行调整有利于合理调度搜索资源和优化搜索过程。 [0011] Compared with the prior art, the present invention is adjusted in each order of the search categories, can be preferentially implemented search on the search word has a higher correlation category, substantially, to facilitate adjustment of the search order reasonable Search resource scheduling and optimization process. 根据该搜索统计信息对第一搜索顺序进行调整是为了进一步优化搜索顺序,使得在尽量满足用户偏好的要求下实现针对搜索词的搜索,这样,便可以侍出疼可能地符合用户喜好的搜索结果。 The search statistics for the first search order is to be adjusted according to further optimize the search order, so as to achieve a search for the search terms in the user's preferences as much as possible to meet the requirements, so that we can serve the search results may be hurt in line with user preferences .

[0012]为让本发明的上述内容能更明显易懂,下文特举优选实施例,并配合所附图式,作详细说明如下: 【附图说明】 [0012] In order to make the above-described present invention can be more fully understood by referring cited preferred embodiments and accompanying figures, described in detail below: BRIEF DESCRIPTION

[0013]图1为本发明的信息搜索方法的第一实施例的流程图。 [0013] FIG. 1 is a flowchart of the first embodiment of the information search method of the present invention.

[00M]图2为本发明的信息搜索方法的第二实施例中搜索统计信息获取模块生成搜索统计信息的步骤的流程图。 The second embodiment of the information search method [00M] FIG. 2 of the present invention, the search statistics module generates a flowchart of the steps of acquiring search statistics.

[0015]图3为本发明的信息搜索方法的第三实施例的流程图。 Flowchart [0015] Figure 3 a third embodiment of an information search method of the present invention.

[0016]图4为本发明的信息搜索方法的第四实施例中的类别库模块挖掘数据记录并构建类别库的步骤的流程图。 The fourth embodiment of the information search method of [0016] FIG. 4 of the present invention records data mining module class libraries and class libraries flowchart of steps construct.

[0017]图5为本发明的信息搜索装置的第一实施例的框图。 A block diagram of the first embodiment [0017] FIG. 5 information search apparatus of the present invention.

[0018]图6为本发明的信息搜索装置的第二实施例中搜索统计信息获取模块的框图。 The second embodiment of the information searching apparatus [0018] FIG. 6 of the present invention searches a block diagram of the statistical information acquisition module. [0019]图7为本发明的信息搜索装置的第三实施例的框图。 A block diagram of a third embodiment of the information search apparatus [0019] FIG. 7 of the present invention.

[0020]图8为本发明的信息搜索装置的第四实施例中类别库模块的框图。 [0020] The block diagram of the fourth embodiment of FIG. 8 information search apparatus of the present invention the module of class libraries. 【具体实施方式】 【Detailed ways】

[0021]以下各实施例的说明是参考附加的图式,用以例示本发明可用以实施的特定实施例。 DESCRIPTION [0021] The following embodiments with reference to the attached drawings, for the particular embodiment used to illustrate embodiments of the present invention.

[0022]为了向用户提供更加精确的搜索结果和在多方面具有关联性的搜索结果,使得所向用户展现的搜索结果更加符合用户的需要,同时也更加符合用户的偏好,本发明的技术方案如下: [0022] In order to provide more accurate search results in search results to users and many have relevance, so that the search results show the user more in line with the user, but also more consistent with the user's preferences, aspect of the present invention as follows:

[0023]参考图1,图1为本发明的信息搜索方法的第一实施例的流程图。 [0023] Referring to FIG 1, FIG. 1 is a flowchart of the first embodiment of the information search method of the present invention.

[0024]在步骤101,根据搜索词从类别库中查找相关的类别。 [0024] In step 101, find the relevant category from the category database according to the search word.

[0025]在步骤102,分析所述搜索词在所述类别中的权重并生成权重信息。 [0025] In step 102, analyze the search term weights in the category weights and generating weight information. 该权重信息是在各类别中搜索的顺序的依据,对搜索词在不同类别中的权重进行分析是为了有重点地对搜索词进行搜索提供准备。 The weight information is based on the search order in each category, and the right to search words in different weight categories were analyzed in order to focus on the search terms provided ready to search.

[0026]在步骤103,根据所述权重信息生成第一搜索顺序,所述第一搜索顺序为针对所述搜索词在所述类别库中不同类别中进行搜索所产生的搜索顺序。 [0026] In step 103, based on the weight information to generate a first search order, for the first search order is the search order in the search generated class libraries in different categories for the search term.

[0027]在步骤104,获取搜索统计信息,所述搜索统计信息是基于对用户搜索的历史进行统计得出。 [0027] In step 104, statistical information acquisition search, the search is based on statistical information on the search history of the user's statistics derived. 搜索统计信息可以包括网络搜索排行信息或者用户的搜索习惯记录,网络搜索排行信息是从互联网上获取的,搜索习惯记录时通过对用户的搜索行为进行统计得到的。 Search statistics include search habits can record the network or the user's search ranking information, Web search ranking information is available on the Internet search behavior of users of statistics obtained when recording through search habits. 该搜索统计信息是进一步调整在各类别中搜索的顺序的依据。 The search statistics are further adjusted according to the search order in each category.

[0028]在步骤1〇5,根据所述搜索统计信息对第一搜索顺序进行调整以产生第二搜索顺序。 [0028] In step 1〇5 the search for the first search order statistical information adjusted according to generate a second search order.

[0029]在步骤106,根据所述第二搜索顺序和所述搜索词进行搜索。 [0029] In step 106, a search according to the search order and the second search term.

[0030]在本实施例中,通过调整在各类别中搜索的顺序,可以优先地实现在与搜索词具有更高相关性的类别上进行搜索,实质上,对搜索顺序进行调整有利于合理调度搜索资源和优化搜索过程。 [0030] In the present embodiment, by adjusting the order of the search in each category, can be achieved preferentially search on the search word has a higher correlation category, substantially, to facilitate adjustment of the search order scheduling reasonable Search and resource optimization process. 根据该搜索统计信息对第一搜索顺序进行调整是为了进一步优化搜索顺序,使得在尽量满足用户偏好的要求下实现针对搜索词的搜索,这样,便可以得出尽可能地符合用户喜好的搜索结果。 The search statistics for the first search order is to be adjusted according to further optimize the search order, so as to achieve a search for the search terms in the user's preferences as much as possible to meet the requirements, so that we can draw the search results as possible in line with user preferences .

[0031]参考图2,图2为本发明的信息搜索方法的第二实施例中搜索统计信息获取模块生成搜索统计信息的步骤的流程图。 [0031] Referring to FIG 2, a second embodiment of the information search method of the present invention, FIG 2 is a flowchart of steps in the search statistics module generates statistical information acquisition search. 本实施例与上述第一实施例相似,不同之处在于: This embodiment is similar to the above-described first embodiment, except that:

[0032]在本实施例中,为了提供该搜索统计信息,事先根据用户的搜索行为生成搜索统计信息。 [0032] In the present embodiment, in order to provide the search statistics, statistics previously generated search based on the search behavior of the user. 例如,可以通过CTR(Click Thorough Rate,广告点击率)点击流数据来动态地分析用户的搜索行为并生成搜索统计信息。 For example, it is possible to dynamically analyze search behavior of users by CTR (Click Thorough Rate, CTR) search click-stream data and generate statistics. 该搜索统计信息可以包括两部分,一是当前的用户的搜索行为(例如,用户的搜索习惯记录),另一是其它用户普遍的搜索行为(例如,网络搜索排行信息)。 The search for statistical information may include two parts, one is the current user search behavior (for example, a user's search habits recording), and the other is the other common user search behavior (for example, Web search ranking information).

[0033]进一步地,搜索统计信息为用户的搜索习惯记录,为了生成该搜索统计信息,在本实施例中,包括以下步骤: [0033] Further, the search statistics for a user's search habits recording, in order to generate the search statistics, in the present embodiment, comprising the steps of:

[0034]在步骤201,根据用户的搜索行为生成搜索记录。 [0034] In step 201, generate a search based on the search behavior of the user record.

[0035]在步骤202,根据搜索记录提取特征信息。 [0035] In step 202, the search feature information is extracted records. 根据搜索记录来提取与用户所搜索选择的信息相关的特征,有利于精确地获知用户的偏好,原因是这些特征能更好地反映用户的喜好。 To extract feature information associated with a user searched selected based on search history, is conducive to accurately know the user's preference, because these features to better reflect the user's preferences.

[0036]在步骤203,根据特征信息分析用户的搜索行为并生成搜索统计信息。 [0036] In step 203, the search feature information analyzing user behavior and generate search statistics. 在此步骤中,可以通过对所提取的特征信息进行归类、总结、统计、模糊匹配等来生成该搜索统计信息。 In this step, the extracted feature information may be classified, summary statistics, generating a fuzzy matching the search statistics information.

[0037]参考图3,图3为本发明的信息搜索方法的第三实施例的流程图。 3, a flowchart of a third embodiment of an information search method of the present invention. FIG. 3 [0037] Referring to FIG. 本实施例与上述第一或第二实施例相似,不同之处在于: The present embodiment is the first or second embodiment is similar, except that:

[0038]在步骤101前还包括步骤301,即,提供类别库,类别库包括至少一个类别。 [0038] Before the step 101 further comprises step 301, i.e., to provide class libraries, class libraries comprising at least one category. 每一个类别下具有多个数据记录,对于类别库中的不同类别,类别库可以记载不同类别下不同数据记录之间的关联性信息。 Having a plurality of data records in each category, class libraries for different classes, class libraries can be described association information between different data records in different categories. 这些信息可以用来作为针对搜索词进行搜索的依据,用以向用户提供在多方面具有关联性的搜索结果,在提供符合用户需求的搜索结果的同时向用户提供超越预期的搜索结果。 At the same time information can be used as a basis for the search for the search term, to provide search results that have relevance to the user in many ways, providing search results in line with user needs beyond the expected to provide search results to users.

[0039]参考图4,图4为本发明的信息搜索方法的第四实施例中的类别库模块挖掘数据记录并构建类别库的步骤的流程图。 [0039] Referring to FIG 4, a fourth embodiment of the information search method according to the present invention, FIG. 4 in the class library module records and data mining steps of flowchart class libraries construct. 本实施例与上述第一、第二、第三实施例中任意一个实施例相似,不同之处在于: The present embodiment and the first, second and any third embodiment is similar to the one embodiment embodiment, except that:

[0040] 为了记载各种各样的数据记录并构造类别库,需要对互联网上的数据进行挖掘以及分析处理,这是本发明实现向用户提供搜索结果的保障性措施,因此,需要为未来将要进行的针对搜索词进行的搜索提供搜索依据。 [0040] In order to record a variety of data recording and structure of class libraries, on the Internet, the need for data mining and analysis, which is the realization of the present invention to provide search results to users of safeguard measures, therefore, need to be for the future searches conducted for the search term to provide search basis.

[0041] 为此,事先执行步骤401至步骤404。 [0041] To this end, step 401 is executed prior to step 404.

[0042] 在步骤401,挖掘数据并形成数据记录。 [0042] In step 401, the data mining and data recording are formed.

[0043] 在步骤401,对数据记录进行分析并生成分析结果。 [0043] In step 401, the data record and generating an analysis result.

[0044]在步骤401,根据分析结果对数据记录进行整理。 [0044] At step 401, collation of the data recording according to the analysis result. 这是为了生成不同数据记录之间的关联性信息,以提供在多方面具有关联性的搜索结果的依据。 This is for generating the relationship information between different data records, in order to provide a basis for correlation with the search results in many ways.

[0045]在步骤401,根据数据记录更新类别库。 [0045] In step 401, the data record update class library.

[0046]在本实施例中,对于步骤101,根据搜索词从类别库中查找与搜索词匹配的数据记录并根据所述数据记录识别用户的搜索意图。 [0046] For step 101, matching the search term to find in accordance with the present embodiment, the search word from a data record class libraries and search intent of a user according to the record identification data.

[0047] 在步骤102,根据搜索意图分析搜索词与数据记录的匹配度并生成权重信息。 [0047] The weight information in step 102, the matching search intent search word analysis of the data recording and generate power.

[0048] 例如,用户所输入的搜索词是“ABC”,可以从类别库的“搜人”、“广播',、“应用”、“微群”等类别中查找相关的类别,通过查找相关类别,可以判断该搜索词与这些类别的相关性,从而识别出用户的搜索意图,而识别用户的意图是为了向用户提供精确度更加高的搜索结果。此外,一个衍生的技术效果是:可以向用户提供在多方面具有关联性的搜索结果, 原因是类别库中所存储的数据都是经过加工整理的,例如,“ABC”在“搜人”、“广播”、“应用”、“微群”在这些类别中有哪些关系,又例如,“ABC”具有若干数量的被引用记录,同样具有相同数量被引用记录的另外的相关信息又有哪些,等等。 [0048] For example, the search terms entered by the user is "ABC", you can find relevant category from the class libraries of "search" and "broadcast" ,, "application", "micro-group" and other categories, by finding relevant category, based on relevance of the search word with these categories, to identify the user's search intent, and recognize the user's intention is to provide accuracy even higher search results to users Further, a derivatized technical effect: can in many ways to provide users with a relevance of search results, because the categories of data stored in the database are processed finishing, for example, "ABC" in the "search" and "broadcast", "application", "micro group "in which the relationship between these categories, and for example," ABC "has been cited by several orders of record, also has additional information about the same number cited records and which, and so on.

[0049]在上述实施例中,通过查找相关类别并识别出用户的搜索意图,可以向用户提供精确度更加高的搜索结果;而对搜索词在不同类别中的权重进行分析是可以使得针对搜索词所进行的搜索更有侧重性;此外,通过调整针对搜索词所进行的搜索在各类别中的顺序, 可以优先地实现在与搜索词具有更高相关性的类别上进行搜索,这有利于合理调度搜索资源和优化搜索过程;此外,根据该搜索统计信息对第一搜索顺序进行调整是为了进一步优化针对搜索词所进行的搜索的顺序,从而可以在尽量满足用户偏好的要求下实现针对搜索词的搜索,这样,便可以得出尽可能地符合用户喜好的搜索结果。 [0049] In the embodiment described above, by finding the relevant category and the identified user's search intent, may provide accuracy even higher search results to a user; and the right search term in the different categories of heavy analysis is that for the search search word carried more focused nature; in addition, by adjusting for the search carried out by order of search terms in each category, priority may be implemented search on search words with higher relevance of the category, which is conducive to rational management of resources and search optimization process; in addition, the search statistics for the first search order is adjusted according to further optimize the sequence of the search terms for the search conducted, thereby enabling the search for possible requirements to meet user preferences search terms, so that they can draw as much as possible in line with the preferences of the user search results.

[0050] 参考图5,图5为本发明的信息搜索装置的第一实施例的框图。 [0050] Referring to FIG 5, a block diagram of an information search apparatus of the present invention. FIG. 5 of the first embodiment. 本实施例的信息搜索装置包括预处理模块501、权重信息生成模块502、搜索统计信息获取模块503、顺序调整模块504和搜索模块505。 Information search apparatus according to the present embodiment includes a preprocessing module 501, a weight information generation module 502, search statistical information acquisition module 503, and a module 504 to adjust the order of search module 505.

[0051] 本实施例的信息搜索装置包括预处理模块、权重信息生成模块、搜索统计信息获取模块、顺序调整模块和搜索模块。 [0051] The information search apparatus according to the present embodiment includes a preprocessing module, a weight information generation module, search statistical information acquisition module, a search module and a sequential adjustment module.

[0052] 预处理模块用于根据搜索词从类别库中查找相关的类别。 [0052] pre-processing module is used to find relevant category from the category database according to the search word.

[0053] 权重信息生成模块用于分析所述搜索词在所述类别中的权重并生成权重信息。 [0053] The weight information generation module for analyzing the search term weights in the category weights and generating weight information. 该权重信息是调整搜索模块505在各类别中搜索的顺序的依据,对搜索词在不同类别中的权重进行分析是为了为搜索模块505有重点地对搜索词进行搜索提供准备。 The weight information is adjusted according to the order of 505 to search in each category search module to search for the right word in different weight categories were analyzed in order to focus on the search terms to search for the search module 505 provides ready.

[0054] 搜索统计信息获取模块用于获取搜索统计信息,所述搜索统计信息是基于对用户搜索的历史进行统计得出。 [0054] search statistics acquisition module for acquiring search statistics, the search statistics are based on historical user search statistics obtained. 搜索统计信息可以包括网络搜索排行信息或者用户的搜索习惯记录,网络搜索排行信息是从互联网上获取的,搜索习惯记录时通过对用户的搜索行为进行统计得到的。 Search statistics include search habits can record the network or the user's search ranking information, Web search ranking information is available on the Internet search behavior of users of statistics obtained when recording through search habits. 该搜索统计信息是进一步调整搜索模块505在各类别中搜索的顺序的依据。 The search statistics are further adjusted according to the order of 505 to search in each category search module.

[0055] 顺序调整模块用于根据所述权重信息生成第一搜索顺序,所述第一搜索顺序为针对所述搜索词在所述类别库中不同类别中进行搜索所产生的搜索顺序,以及用于根据所述搜索统计信息对第一搜索顺序进行调整以产生第二搜索顺序。 [0055] The sequence order adjustment means for the search, the first search order is the search for the search word generated in the class library of different categories of the weight information generated in accordance with a first search order, and with to be adjusted according to the search of the first search order statistical information to generate a second search order.

[0056] 搜索模块用于根据所述第二搜索顺序和所述搜索词进行搜索。 [0056] The search module for searching the second search order and the search word.

[0057] 在本实施例中,通过调整搜索模块505在各类别中搜索的顺序,可以优先地实现在与搜索词具有更高相关性的类别上进行搜索,实质上,对搜索模块5〇5的搜索顺序进行调整有利于合理调度搜索资源和优化搜索过程。 [0057] In the present embodiment, by adjusting the order of 505 searched search module in each category, can be preferentially implemented search on the search word has a higher correlation category, substantially, to the search module 5〇5 the search order is adjusted in favor of rational management of resources and search optimization process. 根据该搜索统计信息对搜索模块505的第一搜索顺序进行调整是为了进一步优化搜索模块505的搜索顺序,使得搜索模块5〇5在尽量满足用户偏好的要求下实现针对搜索词的搜索,这样,便可以得出尽可能地符合用户喜好的搜索结果。 The search for the first search order of statistical information search module 505 is adjusted according to the search order to further optimize the search module 505, search module that searches for the search word 5〇5 implemented in try to meet the requirements of user preferences, so that, it can draw the search results as possible in line with the user's preference.

[0058] 参考图6,图6为本发明的信息搜索装置的第二实施例中搜索统计信息获取模块的框图。 The second embodiment of the information searching apparatus [0058] Referring to FIG. 6, FIG. 6 of the present invention searches a block diagram of the statistical information acquisition module. 本实施例与上述第一实施例相似,不同之处在于: This embodiment is similar to the above-described first embodiment, except that:

[0059] 在本实施例中,为了向顺序调整模块504提供该搜索统计信息,搜索统计信息获取模块503还用于事先根据用户的搜索行为生成搜索统计信息。 [0059] In the present embodiment, the adjustment module 504 in order to provide this sequential search statistics, search statistical information obtaining module 503 is further configured to search statistics generated in advance based on the search behavior of the user. 例如,搜索统计信息获取模块503可以通过CTR(Click Thorough Rate,广告点击率)点击流数据来动态地分析用户的搜索行为并生成搜索统计信息。 For example, a search for statistical information acquisition module 503 can dynamically analyze search behavior of users by CTR (Click Thorough Rate, CTR) search click-stream data and generate statistics. 该搜索统计信息可以包括两部分,一是当前的用户的搜索行为(例如,用户的搜索习惯记录),另一是其它用户普遍的搜索行为(例如,网络搜索排行信息)。 The search for statistical information may include two parts, one is the current user search behavior (for example, a user's search habits recording), and the other is the other common user search behavior (for example, Web search ranking information).

[0060] 进一步地,搜索统计信息为用户的搜索习惯记录,为了生成该搜索统计信息,搜索统计信息获取模块503还可以包括记录模块5〇31、特征提取模块5032和统计分析模块5〇33。 [0060] Further, the search statistics for a user's search habits recording, in order to generate the search statistics, statistical information acquisition search module 503 may further include a recording module 5〇31, feature extraction module 5032, and statistical analysis module 5〇33. 记录模块5031用于根据用户的搜索行为生成搜索记录。 Generating a recording module 5031 searches for the search behavior of the user. 特征提取模块5〇32用于根据搜索记录提取特征信息。 5〇32 feature extraction module configured to extract feature information from the search record. 根据搜索记录来提取与用户所搜索选择的信息相关的特征,有利于精确地获知用户的偏好,原因是这些特征能更好地反映用户的喜好。 To extract feature information associated with a user searched selected based on search history, is conducive to accurately know the user's preference, because these features to better reflect the user's preferences. 统计分析模块5〇33用于根据特征信息分析用户的搜索行为并生成搜索统计信息。 5〇33 statistical analysis module for analyzing search behavior based on feature information and generating search statistics. 统计分析模块5033可以通过对所提取的特征信息进行归类、总结、统计、模糊匹配等来生成该搜索统计信息。 Statistical analysis module 5033 can be characterized by the extracted information to classify, summary statistics, fuzzy matching to generate the search statistics.

[0061] 参考图7,图7为本发明的信息搜索装置的第三实施例的框图。 [0061] Referring to Figure 7, a block diagram of a third embodiment of the information searching apparatus in FIG. 7 of the present invention. 本实施例与上述第一或第二实施例相似,不同之处在于: The present embodiment is the first or second embodiment is similar, except that:

[0062] 本实施例的信息搜索装置还包括类别库模块701。 [0062] The information search apparatus according to the present embodiment further includes a class library module 701. 类别库模块701用于提供类别库,类别库包括至少一个类别。 Class library module 701 for providing class libraries, class libraries comprising at least one category. 每一个类别下具有多个数据记录,对于类别库中的不同类另IJ,类别库模块701可以记载不同类别下不同数据记录之间的关联性信息。 Having a plurality of data records in each category, class libraries for different classes another IJ, class library module 701 may be described in relationship information between different data records in different categories. 这些信息可以用来作为搜索模块505针对搜索词进行搜索的依据,用以向用户提供在多方面具有关联性的搜索结果,在提供符合用户需求的搜索结果的同时向用户提供超越预期的搜索结果。 This information can be used as a search module 505 based on the search for the search term, to provide search results in many ways that have relevance to the user, providing search results in line with user needs exceed expectations while providing search results to users .

[0063]参考图8,图8为本发明的信息搜索装置的第四实施例中类别库模块的框图。 [0063] Referring to Figure 8, a block diagram of a fourth embodiment of the class library module information search apparatus of the present invention. FIG. 本实施例与上述第一、第二、第三实施例中任意一个实施例相似,不同之处在于: The present embodiment and the first, second and any third embodiment is similar to the one embodiment embodiment, except that:

[0064]为了记载各种各样的数据记录并构造类别库,需要对互联网上的数据进行挖掘以及分析处理,这是本发明实现向用户提供搜索结果的保障性措施,因此,需要为未来将要进行的针对搜索词进行的搜索提供搜索依据。 [0064] In order to record a variety of data recording and structure of class libraries, on the Internet, the need for data mining and analysis, which is the realization of the present invention to provide search results to users of safeguard measures, therefore, need to be for the future searches conducted for the search term to provide search basis. 为此,类别库模块701包括数据挖掘模块7011。 To this end, the class library module 701 includes a data mining module 7011. 数据挖掘模块7011用于挖掘数据并形成数据记录。 Data mining module 7011 for forming a data recording and data mining. 类别库模块701还包括数据分析模块7012和数据整理模块7013。 Category library module 701 also includes a data analysis module 7012 and data cleansing module 7013. 数据分析模块7012用于对数据记录进行分析并生成分析结果。 Data analysis module 7012 is used to record data and generating an analysis result. 数据整理模块7013用于根据分析结果对数据记录进行整理。 Data reduction module 7013 for sorting the data records in accordance with the analysis result. 这是为了生成不同数据记录之间的关联性信息,以向搜索模块505提供在多方面具有关联性的搜索结果的依据。 This is for generating the relationship information between different data records, in order to provide a basis for correlation with the search results to the search module 505 in many ways.

[0065]类别库模块701还包括更新模块7014。 [0065] class library module 701 also includes an update module 7014. 更新模块7014用于根据数据记录更新类别库。 The updating module 7014 for updating the data record class library.

[0066]进一步地,在本实施例中,预处理模块501还用于根据搜索词从类别库中查找与搜索词匹配的数据记录并根据所述数据记录识别用户的搜索意图。 [0066] Further, in the present embodiment, the preprocessing module 501 is further configured to locate data matching the search term according to the data recording and records the user's search intent identified category from the library based on the search word. 权重信息生成模块5〇2还用于根据搜索意图分析搜索词与数据记录的匹配度并生成权重信息。 5〇2 weight information generating module is further configured to search intent matches the search word analysis of the data recording and generating weight information.

[0067]例如,用户所输入的搜索词是“ABC”,预处理模块5〇1可以从类别库的“搜人”、“广播”、“应用”、“微群”等类别中查找相关的类别,通过查找相关类别,可以判断该搜索词与这些类别的相关性,从而识别出用户的搜索意图,而识别用户的意图是为了向用户提供精确度更加高的搜索结果。 [0067] For example, the search terms entered by the user is "ABC", pre-processing module 5〇1 can find related class libraries from the "search" and "broadcast", "application", "micro-group" and other categories category by category to find relevant, can determine the relevance of the search terms to these categories so as to identify the user's search intent, and identify the user's intention is to provide more high accuracy of search results to the user. 此外,一个衍生的技术效果是:可以向用户提供在多方面具有关联性的搜索结果,原因是类别库中所存储的数据都是经过加工整理的,例如,“ABC”在“搜人”、 “广播”、“应用”、“微群”在这些类别中有哪些关系,又例如,“ABC"具有若干数量的被引用记录,同样具有相同数量被引用记录的另外的相关信息又有哪些,等等。 In addition, a technical effect is derived: the user may provide search results have relevance in many ways, because the data stored in the class library are processed finishing, e.g., "the ABC" in the "search person", "broadcast", "application", "micro group" in which the relationship between these categories, and for example, "ABC" has been cited by several orders of record, also has additional information about the same number cited records and which, and many more.

[0068]在上述实施例中,通过查找相关类别并识别出用户的搜索意图,可以向用户提供精确度更加高的搜索结果;而对搜索词在不同类别中的权重进行分析是可以使得针对搜索词所进行的搜索更有侧重性;此外,通过调整针对搜索词所进行的搜索在各类别中的顺序, 可以优先地实现在与搜索词具有更高相关性的类别上进行搜索,这有利于合理调度搜索资源和优化搜索过程;此外,根据该搜索统计信息对第一搜索顺序进行调整是为了进一步优化针对搜索词所进行的搜索的顺序,从而可以在尽量满足用户偏好的要求下实现针对搜索词的搜索,这样,便可以得出尽可能地符合用户喜好的搜索结果。 [0068] In the embodiment described above, by finding the relevant category and the identified user's search intent, may provide accuracy even higher search results to a user; and the right search term in the different categories of heavy analysis is that for the search search word carried more focused nature; in addition, by adjusting for the search carried out by order of search terms in each category, priority may be implemented search on search words with higher relevance of the category, which is conducive to rational management of resources and search optimization process; in addition, the search statistics for the first search order is adjusted according to further optimize the sequence of the search terms for the search conducted, thereby enabling the search for possible requirements to meet user preferences search terms, so that they can draw as much as possible in line with the preferences of the user search results. '、 [0069]综上所述,虽然本发明己以优选实施例揭露如上,但上述优选实施例并非用以限制本发明,本领域的普通技术人员,在不脱离本发明的精神和范围内,均可作各种更动与润饰,因此本发明的保护范围以权利要求界定的范围为准。 ', [0069] In summary, although the disclosure of the present invention to have the above preferred embodiments, but these embodiments are not intended to limit the preferred embodiments of the present invention, those of ordinary skill in the art, without departing from the spirit and scope of the present invention , can make various modifications and variations of the present invention Therefore, the scope of the claims to define the scope of equivalents.

Claims (12)

1. 一种信息搜索方法,其特征在于,所述方法包括以下步骤: 根据搜索词从类别库中查找相关的类别、以及与所述搜索词匹配的数据记录,并根据所述数据记录识别用户的搜索意图,其中所述类别库用于记载不同类别下不同数据记录之间的关联性彳目息; 根据所述搜索意图分析所述搜索词与所述数据记录的匹配度,根据所述匹配度确定所述搜索词在所述相关的类别中的权重并生成权重信息; 获取搜索统计信息,所述搜索统计信息是基于对用户搜索的历史进行统计得出; 根据所述权重信息生成第一搜索顺序,所述第一搜索顺序为针对所述搜索词在所述类别库中不同类别中进行搜索所产生的搜索顺序; 根据所述搜索统计信息对第一搜索顺序进行调整以产生第二搜索顺序; 根据所述第二搜索顺序和所述搜索词进行搜索。 1. An information search method, characterized in that the method comprises the steps of: find related category from the category database according to the search word and the search word matches the data recording, and recording data according to the user identification the search intent, wherein said class library according to the association information between the left foot purposes different categories in different data records; match analysis according to the intention of the search term and the search data recorded, according to the matched determine the right of the search word in the relevant categories of heavy weights and generate information; obtaining search statistics, the search statistics are based on historical user search statistics obtained; based on the weight of the right information to generate a first search order, for the first search order is the search order in the search generated class libraries in different categories for the search word; the search for the first search order of statistical information adjusted according to generate a second search sequence; search according to the search order and the second search term.
2.根据权利要求1所述的信息搜索方法,其特征在于,所述方法还包括以下步骤: 根据用户的搜索行为生成搜索记录; 根据所述搜索记录提取特征信息; 根据所述特征信息分析用户的搜索行为并生成所述搜索统计信息。 The information search method according to claim 1, wherein said method further comprises the steps of: generating a search history of the user based on the search behavior; extracting feature information based on the search history; user according to the characteristic information analysis search behavior and generate the search statistics.
3. 根据权利要求1所述的信息搜索方法,其特征在于,所述方法还包括以下步骤: 提供所述类别库,所述类别库包括至少一个类别。 The information search method according to claim 1, wherein said method further comprises the steps of: providing the class libraries, the library comprises at least one category classes.
4. 根据权利要求1所述的信息搜索方法,其特征在于,所述方法还包括以下步骤: 挖掘数据并形成数据记录。 4. The information search method according to claim 1, wherein said method further comprises the steps of: forming a data recording and data mining.
5. 根据权利要求4所述的信息搜索方法,其特征在于,所述方法还包括以下步骤: 对所述数据记录进行分析并生成分析结果; 根据所述分析结果对所述数据记录进行整理。 The information search method according to claim 4, wherein said method further comprises the steps of: recording said data and generating an analysis result; collate the data record according to the analysis result.
6. 根据权利要求5所述的信息搜索方法,其特征在于,所述方法还包括以下步骤: 根据所述数据记录更新所述类别库。 The information search method as claimed in claim 5, wherein said method further comprises the step of: updating the record according to said data class libraries.
7. —种信息搜索装置,其特征在于,所述装置包括: 预处理模块,用于根据搜索词从类别库中查找相关的类别、以及与所述搜索词匹配的数据记录,并根据所述数据记录识别用户的搜索意图,其中所述类别库用于记载不同类别下不同数据记录之间的关联性信息; 权重信息生成模块,用于根据所述搜索意图分析所述搜索词与所述数据记录的匹配度,根据所述匹配度确定所述搜索词在所述相关的类别中的权重并生成权重信息; 搜索统计信息获取模块,用于获取搜索统计信息,所述搜索统计信息是基于对用户搜索的历史进行统计得出; 顺序调整模块,用于根据所述权重信息生成第一搜索顺序,所述第一搜索顺序为针对所述搜索词在所述类别库中不同类别中进行搜索所产生的搜索顺序,以及用于根据所述搜索统计信息对第一搜索顺序进行调整以产生第二 7. - types of information search apparatus, wherein the apparatus comprises: a preprocessing module configured to search according to the search word related category from the category database, and the data records matching the search term, and in accordance with the identifying a user data record search intent, wherein said class library according to the association information between different data records in different categories; weight information generation module for analyzing the data according to the search word and the search intent matching record, the matching degree determination according to the search term weights in the category associated weights and generate the weight information; search statistics acquisition module, configured to obtain statistical information search, the search is based on the statistical information search history of the user obtained statistics; order adjusting module according to the weight information for generating a first search order, the first search order to search the database in different categories of the category for the search term in search sequentially generated, and the search for the first search order of statistical information adjusted according to generate a second 搜索顺序; 搜索模块,用于根据所述第二搜索顺序和所述搜索词进行搜索。 Search order; searching module configured to search according to the search order and the second search term.
8. 根据权利要求7所述的信息搜索装置,其特征在于,所述搜索统计信息获取模块包括: 记录模块,用于根据用户的搜索行为生成搜索记录; 特征提取模块,用于根据所述搜索记录提取特征信息; 统计分析模块,用于根据所述特征信息分析用户的搜索行为并生成所述搜索统计信J窗、D The information search apparatus according to claim 7, wherein the search statistics information acquisition module comprises: a recording module for generating a search history based on the search behavior of the user; feature extraction module, according to the search for recording feature information is extracted; statistical analysis module for analyzing the characteristic information search behavior of the user and to generate statistics of the search window J, D
9.根据权利要求7所述的信息搜索装置,其特征在于,所述装置还包括: 类别库模块,用于提供所述类别库,所述类别库包括至少一个类别。 The information search apparatus according to claim 7, characterized in that said apparatus further comprising: a class library module for providing the class libraries, the library comprises at least one category classes.
10. 根据权利要求9所述的信息搜索装置,其特征在于,所述类别库模块包括: 数据挖掘模块,用于挖掘数据并形成数据记录。 The information search apparatus according to claim 9, wherein said module class libraries comprising: a data mining module for forming a data recording and data mining.
11. 根据权利要求10所述的信息搜索装置,其特征在于,所述类别库模块还包括: 数据分析模块,用于对所述数据记录进行分析并生成分析结果; 数据整理模块,用于根据所述分析结果对所述数据记录进行整理。 The information search apparatus according to claim 10, wherein said class library module further comprises: a data analysis module for the data records and generating an analysis result; data arrangement module for the analysis result of the data recording finishing.
12.根据权利要求11所述的信息搜索装置,其特征在于,所述类别库模块还包括: 更新模块,用于根据所述数据记录更新所述类别库。 The information search apparatus according to claim 11, wherein said class library module further comprises: an updating module, for updating the record according to said data class libraries.
CN201210482905.8A 2012-11-23 2012-11-23 Information search apparatus and method CN103838754B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210482905.8A CN103838754B (en) 2012-11-23 2012-11-23 Information search apparatus and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210482905.8A CN103838754B (en) 2012-11-23 2012-11-23 Information search apparatus and method

Publications (2)

Publication Number Publication Date
CN103838754A CN103838754A (en) 2014-06-04
CN103838754B true CN103838754B (en) 2017-12-22

Family

ID=50802267

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210482905.8A CN103838754B (en) 2012-11-23 2012-11-23 Information search apparatus and method

Country Status (1)

Country Link
CN (1) CN103838754B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104965839B (en) * 2014-09-25 2018-09-07 腾讯科技(深圳)有限公司 Similar kinds of information search method and apparatus
CN105095187A (en) * 2015-08-07 2015-11-25 广州神马移动信息科技有限公司 Search intention identification method and device
CN105224472B (en) * 2015-10-22 2018-08-28 上海新储集成电路有限公司 Kind of matching method and system to find frequently used content

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101019118A (en) * 2004-07-13 2007-08-15 谷歌股份有限公司 Personalization of placed content ordering in search results
CN101079033A (en) * 2006-06-30 2007-11-28 腾讯科技(深圳)有限公司 Integrative searching result sequencing system and method
CN101158971A (en) * 2007-11-15 2008-04-09 深圳市迅雷网络技术有限公司 Search result ordering method and device based on search engine
CN102622417A (en) * 2012-02-20 2012-08-01 北京搜狗信息服务有限公司 Method and device for ordering information records

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6895406B2 (en) * 2000-08-25 2005-05-17 Seaseer R&D, Llc Dynamic personalization method of creating personalized user profiles for searching a database of information

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101019118A (en) * 2004-07-13 2007-08-15 谷歌股份有限公司 Personalization of placed content ordering in search results
CN101079033A (en) * 2006-06-30 2007-11-28 腾讯科技(深圳)有限公司 Integrative searching result sequencing system and method
CN101158971A (en) * 2007-11-15 2008-04-09 深圳市迅雷网络技术有限公司 Search result ordering method and device based on search engine
CN102622417A (en) * 2012-02-20 2012-08-01 北京搜狗信息服务有限公司 Method and device for ordering information records

Also Published As

Publication number Publication date
CN103838754A (en) 2014-06-04

Similar Documents

Publication Publication Date Title
US20090006343A1 (en) Machine assisted query formulation
US20090006345A1 (en) Voice-based search processing
CN102117321B (en) Automatic discovery of aggregation and subject areas discussed in the organization
US20120198073A1 (en) Dynamically organizing cloud computing resources to facilitate discovery
US20130138636A1 (en) Image Searching
US9104979B2 (en) Entity recognition using probabilities for out-of-collection data
US20120143875A1 (en) Method and system for discovering dynamic relations among entities
JP5192475B2 (en) Object classification method and object classification system
JP5749279B2 (en) Binding buried for the item association
CN101990670B (en) Search results ranking using editing distance and document information
CN102004782A (en) Search result sequencing method and search result sequencer
WO2015175931A1 (en) Language modeling for conversational understanding domains using semantic web resources
CN102855268B (en) Image ranking method and system based on attribute correlation
JP2012533818A (en) Ranking of search results based on the weight of the word
US8694511B1 (en) Modifying search result ranking based on populations
CN104573028A (en) Intelligent question-answer implementing method and system
CN102473190B (en) Keyword assignment to a web page
CA2829735C (en) Method and system for information modeling and applications thereof
CN103116588A (en) Method and system for personalized recommendation
CN102902821B (en) Advanced network-based image semantic annotation hot topic retrieval method and apparatus
CN104143001B (en) Method and apparatus for the search term recommendation
JP6073345B2 (en) Results How to rank apparatus, and retrieval method and apparatus
CN103577423B (en) Keywords classification method and system
US8825661B2 (en) Systems and methods for two stream indexing of audio content
CN1996316A (en) Search engine searching method based on web page correlation

Legal Events

Date Code Title Description
C06 Publication
C10 Entry into substantive examination
GR01