CN102750277A - Method and device for obtaining information - Google Patents

Method and device for obtaining information Download PDF

Info

Publication number
CN102750277A
CN102750277A CN 201110096463 CN201110096463A CN102750277A CN 102750277 A CN102750277 A CN 102750277A CN 201110096463 CN201110096463 CN 201110096463 CN 201110096463 A CN201110096463 A CN 201110096463A CN 102750277 A CN102750277 A CN 102750277A
Authority
CN
Grant status
Application
Patent type
Prior art keywords
information
key
user
obtaining
words
Prior art date
Application number
CN 201110096463
Other languages
Chinese (zh)
Other versions
CN102750277B (en )
Inventor
李亚楠
杨月奎
焦峰
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date

Links

Abstract

The invention discloses a method for obtaining information, which comprises the following steps of: obtaining key words input by a user; according to the preset key word matching conditions, obtaining a first information set matched with the key word content; judging whether the number of information in the first information set is greater than the number of the preset information or not, and whether the first information set comprises at least two semantic categories or not, if so, obtaining the preset number of information, and the information comprising at least two semantic categories; and sending the information to the user. According to the embodiment of the invention, at least two semantic categories of information is obtained in the information matched with the key words input by the user, so that the keyword-related types of information is provided for the user, and thus the related information can be obtained without re-inputting the key words related to the key words, the operation of the user is reduced, and the user experience is improved.

Description

获取信息的方法和装置 Information acquisition method and apparatus

技术领域 FIELD

[0001] 本发明涉及通信技术领域,特别涉及一种获取信息的方法和装置。 [0001] The present invention relates to communication technology, and particularly relates to a method and apparatus for acquiring information.

背景技术 Background technique

[0002] 问答系统是互联网用户获取信息的一种常见工具,例如百度知道、搜搜问问等。 [0002] Q & A system is a common tool of Internet users access to information, such as Baidu, Soso ask and so on. 为了满足用户的信息浏览需求,问答系统会检索推送与当前浏览问题相关的其他问题或答案,这里称之为“相关问题”。 In order to meet the information needs of the user's browser, the system retrieves the questions and answers related to the current push browsing issues or answers to other questions, here called "issues." 相关问题可以进一步的满足用户的浏览需求。 Issues related to further meet the needs of the user's browser. 然而由于受展示空间限制,一个问题的相关问题往往只能显示5条左右,很多时候无法将所有的相关问题进行展示,因此需要一定的方法选取最具代表性的几条相关问题。 However, due to restrictions by the exhibition space, a problem related problems often show only about 5, are often unable to display all of the relevant issues, and therefore requires a certain method to select the most representative of several related issues.

[0003] 现有相关问题检索系统选取与当前浏览问题语义最接近的几条问题,并依次展示给用户。 [0003] existing issues related to retrieval systems select few closest to browse the semantic problem with the current problem, and in turn presented to the user. 其技术实现为:首先,获取用户点击或输入的问题Q ;然后,利用信息检索或自然语言处理技术,从以往收集或记录的问题数据库中检索出与问题Q相关的问题集R(Q);然后,对R(Q)中相关问题按照与Q的语义相关度进行排序;最后,选取R(Q)中排名最高的N条相关问题进行展示。 Its technical realization as follows: First, a click or input question Q; then, the use of information retrieval or the natural language processing technology, from the problem database previously collected or recorded retrieved related issues set R (Q) and issues Q; then, R & lt pair (Q) issues sorted by semantic relevance to Q; and finally, R & lt selecting (Q) of the N highest ranked show related issues. 其中,N为相关问题在页面上显示条数的最大值。 Wherein, N is the maximum number of questions displayed on the page strip.

[0004] 现有技术方案为用户提供与用户提出的问题在内容上相关信息。 [0004] The prior art solutions to provide users with information on the content and issues raised by users. 但通过现有的技术方案给出的相关问题搜索结果,都是语义相同或非常接近的,当用户希望泛泛地浏览某类问题的其他方面的相关知识时,现有的技术方案,无法满足用户需求,需要用户重新输入该某类问题的其他方面信息,重新搜索,减少了用户体验。 But given the existing technical solutions related issues search results, the semantics are the same or very close, when the user wants to browse the general knowledge of other aspects of certain types of problems, existing technical solutions, can not meet the user demand, require the user to re-enter information that certain other aspects of the problem, re-search, reduce the user experience.

[0005] 例如,某用户想装修自己的房子,则该用户可通过输入问题浏览问题为“近几年最流行装修风格是什么? ”获取相关内容,相关问题一般只能显示5条左右,则通过现有技术方案,用户可获取5条关于“装修风格”的信息;但用户可能还希望获取有关装修材料、装修价格、附近装修商口碑等各类知识的相关问题及回答,则需要重新输入需要获取信息的关键字,增加了用户的操作。 [0005] For example, a user wants to decorate his house, the user can browse through the input problem issues "in recent years, what is the most popular style of decoration?" Access to relevant content, relevant issues generally only about 5 show, the the prior art solutions, users can obtain information about five "decoration style"; but you may also want to obtain information about decoration materials, decoration prices, and answer all kinds of questions related to knowledge of word of mouth and other nearby decorators, you will need to re-enter You need to obtain key information, increasing the user's actions.

发明内容 SUMMARY

[0006]为了简化搜索操作,提高用户体验,本发明实施例提供了一种获取信息的方法,所述方法包括: [0006] In order to simplify the search operation, the user experience is improved, embodiments of the present invention provides a method for acquiring information, the method comprising:

[0007] 获取用户输入的关键字; [0007] Get keyword entered by the user;

[0008] 根据预设的关键字匹配条件,获取与所述关键字内容匹配的第一信息集; [0008] according to a preset keyword matching condition, first information set and the content keyword match;

[0009] 判断所述第一信息集中的信息数量是否大于预设信息数量,且所述第一信息集是否包括至少两个语义类,如果是,则获取所述预设信息数量的信息,所述信息包括至少两个语义类; The amount of information [0009] determining whether the first information set is greater than a preset number information, and if the first set of information comprising at least two semantic categories, if so, obtaining information about the number of the preset information, the said semantic information comprises at least two classes;

[0010] 发送所述信息至所述用户。 [0010] transmits the information to the user.

[0011] 所述判断所述第一信息集中的信息数量是否大于预设信息数量,且所述第一信息集是否包括至少两个语义类,具体包括: [0011] The determination of the amount of information of the first information set is greater than the preset number information, and if said first set of information comprising at least two semantic categories, including:

[0012] 获取所述第一信息集中的信息数量,判断所述信息数量是否大于预设信息数量;[0013] 对所述第一信息集中的信息按语义类进行文本聚类; [0012] acquiring the number information of the first information set, determines the amount of information is greater than a preset number information; [0013] information of the first set of text information semantically based clusters;

[0014] 获取所述第一信息集包含的语义类的数量; [0014] get the number of the semantics of a class of the first set of information comprising;

[0015] 判断所述语义类的数量是否大于或等于两个。 [0015] Analyzing the semantics of a class number is greater than or equal to two.

[0016] 所述获取预设信息数量的信息,所述信息包括至少两个语义类具体包括: [0016] The information acquiring preset number information, the semantic information comprises at least two classes comprises:

[0017] 当所述第一信息集包含的语义类的数量小于所述预设信息数量时,则在每个语义类包含的信息中获取一个信息,得到第一临时信息集; [0017] When the semantics of a class number of the first set of information comprising less than the preset number information, acquiring a semantic class information of each message contained in the first provisional set of information;

[0018] 计算所述预设信息数量与所述语义类的数量的差值数; [0018] computing the preset number information of the number of the difference of the number of the semantic category;

[0019] 对所述第一信息集中剩余的信息按其与所述关键字的匹配度由高到低进行排序; [0020] 获取排序后信息位置序号小于或等于所述差值数的信息,得到第二临时信息集,并将所述第一临时信息集和所述第二临时信息集合并,得到所述预设信息数量的信息; [0019] the first information set its remaining information to the keyword matching degree sorted in descending order; [0020] After obtaining sequencing information position number less than or equal to the number of the difference information, to obtain a second set of temporary information and the first set and the second temporary information collection and temporary information, obtain information about the number of the preset information;

[0021] 当所述第一信息集包含的语义类的数量大于所述预设信息数量时,则在每个语义类包含的信息中获取一个信息,得到第四临时信息集; [0021] When the semantics of a class number of the first set of information comprises information greater than the preset number, a message is acquired in the information contained in each semantic class, to obtain a fourth set of temporary information;

[0022] 对所述第四临时信息集中的信息按其与所述关键字的匹配度由高到低进行排序; [0022] The fourth temporary information according to their matching degree information set to the keyword sorted in descending order;

[0023] 获取排序后第四临时信息集中信息位置序号小于或等于所述预设信息数量的信息,得到所述预设信息数量的信息。 After obtaining the sorting [0023] The fourth temporary position information of information set to the number less than or equal to a predetermined information amount of information, to obtain information about the number of the preset information.

[0024] 所述获取预设信息数量的信息,所述信息包括至少两个语义类具体包括: [0024] The information acquiring preset number information, the semantic information comprises at least two classes comprises:

[0025] 对所述第一信息集中的信息按其与所述关键字的匹配度由低到高进行排序; [0025] the first information message to be set to the keyword matching degree is sorted from low to high;

[0026] 当所述第一信息集为SQ = {sq0, sq1; sq2, • , sqm} , m为所述第一信息集中的信息数; [0026] When the first set of information is SQ = {sq0, sq1; sq2, •, sqm}, m is the number of messages of the first information set;

[0027] 则根据rqx = sqy获取至少两个语义类的预设信息数量的信息; [0027] The number of preset information is acquired information of the at least two semantic categories according rqx = sqy;

[0028] 其中,少=卜a_|,a = logNm, N为预设信息数量,rqx为按= 在SQ中获取后的信息。 [0028] wherein Bu = less a_ |, a = logNm, N is a preset number information, rqx the information acquired in the SQ press =.

[0029] 所述将所述信息发送给用户具体包括: [0029] transmits the information to the user comprises:

[0030] 对所述信息按其与所述关键字的匹配度由高到低进行排序; [0030] the message to be the keyword matching degree sorted in descending order;

[0031] 将排序后的信息,按顺序依次发送给用户。 [0031] The information sorted sequentially in the order sent to the user.

[0032] 本发明实施例提供了一种获取信息的装置,所述装置包括: [0032] The embodiments of the present invention provides an apparatus for acquiring information, the apparatus comprising:

[0033] 关键字获取模块,用于获取用户输入的关键字; [0033] keyword obtaining module, configured to obtain a keyword input by a user;

[0034] 第一信息集获取模块,用于根据预设的关键字匹配条件,获取与所述关键字内容匹配的第一信息集; [0034] The first set of information obtaining module, according to a preset keyword matching condition, first information set and the content keyword match;

[0035] 信息获取模块,用于判断所述第一信息集中的信息数量是否大于预设信息数量,且所述第一信息集是否包括至少两个语义类,如果是,则获取所述预设信息数量的信息,所述信息包括至少两个语义类; [0035] The information acquiring module, for determining the amount of information of the first information set is greater than the preset number information, and if the first set of information comprising at least two semantic categories, if so, obtaining the preset information amount of information, the semantic information comprises at least two classes;

[0036] 信息发送模块,用于发送所述信息至所述用户。 [0036] The information sending module, configured to send the information to the user.

[0037] 所述信息获取模块,具体包括: [0037] The information acquiring module comprises:

[0038] 信息数量确定单元,用于获取所述第一信息集中的信息数量,判断所述信息数量是否大于预设信息数量; [0038] The number of information determining unit configured to acquire number information of the first information set, determines the amount of information is greater than a preset number information;

[0039] 文本聚类单元,用于对所述第一信息集中的信息按语义类进行文本聚类;[0040] 语义类数量获取单元,用于获取所述第一信息集包含的语义类的数量; [0039] Text clustering unit, information for the first set of text information semantically based clusters; [0040] number of semantic category acquisition unit for acquiring the semantic type of the first set of information contained in number;

[0041] 语义类确定单元,用于判断所述语义类的数量是否大于或等于两个; [0041] semantic class determining means for determining the semantics of a class number is greater than or equal to two;

[0042] 信息获取单元,用于当所述第一信息集中的信息数量大于预设信息数量,且所述第一信息集包括至少两个语义类时,获取所述预设信息数量的信息,所述信息包括至少两个语义类。 [0042] The information acquisition unit, when the amount of information for the first information set is greater than a preset number information, and the first set of information comprising at least two semantic categories when acquiring the predetermined information amount of information, the semantic information includes at least two classes.

[0043] 所述信息获取模块,具体包括: [0043] The information acquiring module comprises:

[0044] 临时信息集生成单元,用于当所述第一信息集包含的语义类的数量小于所述预设信息数量时,则在每个语义类包含的信息中获取一个信息,得到第一临时信息集; [0044] Temporary information set generating unit for, when the number of semantic category comprises the first set of information is less than the preset number information, a message is acquired in the information included in each semantic class, to give first temporary set of information;

[0045] 数量差值数计算单元,用于计算所述预设信息数量与所述语义类的数量的差值数; [0045] Number of Number of difference calculating unit for calculating the difference between the number of the preset number and the semantic information on the number of classes;

[0046] 预设信息获取单元,用于对所述第一信息集中剩余的信息按其与所述关键字的匹配度由高到低进行排序,获取排序后信息位置序号小于或等于所述差值数的信息,得到第二临时信息集,并将所述第一临时信息集和所述第二临时信息集合并,得到所述预设信息数量的信息; [0046] The preset information acquisition unit, the first information set for the rest of their information to the keyword matching degree sorted in descending order, the ordering information acquired position number less than or equal to the difference information on the number of values ​​to obtain the second set of temporary information and the first set and the second temporary information collection and temporary information, obtain information about the number of the preset information;

[0047] 第一信息获取单元,用于当所述第一信息集包含的语义类的数量大于所述预设信息数量时,则在每个语义类包含的信息中获取一个信息,得到第四临时信息集,对所述第四临时信息集中的信息按其与所述关键字的匹配度由高到低进行排序,获取排序后第四临时信息集中信息位置序号小于或等于所述预设信息数量的信息,得到所述预设信息数量的信肩、O [0047] The first information acquiring unit for, when the number of the semantics of a class of the first set of information comprises information greater than the preset number, a message is acquired each semantic class information contained in the fourth to give temporary information set, the fourth set of information according to their temporary information matching the keyword is sorted in descending order, the fourth temporary information set sorting information after obtaining the position number is less than or equal to the preset information the number of information channels to obtain preset number information of the shoulder, O

[0048] 所述信息获取模块具体包括: [0048] The information acquiring module comprises:

[0049] 第二信息获取单元,用于对所述第一信息集中的信息按其与所述关键字的匹配度由低到高进行排序,当所述第一信息集为SQ = {sq0, sq1; sq2, •,sqm}为所述第一信息集中 [0049] The second information acquiring unit, the first information set for the information of the matching degree according to their keywords are sorted from low to high, when the first set of information is SQ = {sq0, sq1; sq2, •, sqm} to the first information set

的信息数时,则根据rqx = sqy获取至少两个语义类的预设信息数量的信息;其中,=a = logNm, N为预设信息数量,rqx为按= 在SQ中获取后的信息。 When the number of messages, according to the rqx = sqy acquires at least two predetermined classes of semantic information about the number of information; wherein, = a = logNm, N is a preset number information, rqx according to the information acquired in the SQ =.

[0050] 所述信息发送模块,具体包括: [0050] The information sending module comprises:

[0051] 关键字排序单元,用于对所述信息按其与所述关键字的匹配度由高到低进行排序; [0051] The keyword sorting unit, the information for the degree of matching its key sorted in descending order;

[0052] 信息发送单元,用于将排序后的信息,按顺序依次发送给用户。 [0052] The information transmitting unit configured to sort the information, are sequentially sent to the user sequentially.

[0053] 本发明实施例,在与用户输入的关键字匹配的信息中,获取至少两个语义类的信息,为用户提供与关键字相关类型的信息,从而无需用户重新输入与该关键字相关的关键字,即可获取相关信息,减少了用户的操作,提高了用户体验。 [0053] Example embodiments of the present invention, the information matching a keyword input by the user, the obtaining information of at least two classes of semantics, providing information related to the keyword is a type of user, eliminating the need to re-enter a user associated with the keyword keywords, you can obtain relevant information, reducing the user's operation and improve the user experience.

附图说明 BRIEF DESCRIPTION

[0054] 图I是本发明实施例I提供的获取信息的方法流程图; [0054] Figure I is a flowchart of a method of providing access to information I embodiment of the present invention;

[0055] 图2是本发明实施例2提供的获取信息的方法流程图; [0055] FIG 2 is a flowchart of a method to obtain information according to the second embodiment of the present invention is provided;

[0056] 图3是本发明实施例3提供的获取信息的方法流程图; [0056] FIG. 3 is a flowchart of a method of acquiring the information provided in Example 3 of the embodiment of the present invention;

[0057] 图4是本发明实施例4提供的获取信息的方法流程图; [0057] FIG. 4 is a flowchart of a method to obtain information according to embodiment 4 of the present invention is provided;

[0058] 图5是本发明实施例5提供的获取信息的装置示意图;[0059] 图6是本发明实施例6提供的获取信息的装置示意图; [0058] FIG. 5 is a schematic diagram of the apparatus of Example 5 to obtain information provided by the embodiment of the present invention; [0059] FIG. 6 is a schematic diagram of the information acquisition apparatus 6 provided in the embodiment of the present invention;

[0060] 图7是本发明实施例7提供的获取信息的装置示意图; [0060] FIG. 7 is a schematic diagram of the information acquisition apparatus 7 provided in the embodiment of the present invention;

[0061] 图8是本发明实施例8提供的获取信息的装置示意图。 [0061] FIG. 8 is a schematic diagram of the apparatus of Example 8 to obtain information provided by the embodiment of the present invention.

具体实施方式 detailed description

[0062] 为使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明实施方式作进一步地详细描述。 [0062] To make the objectives, technical solutions, and advantages of the present invention will become apparent in conjunction with the accompanying drawings of the following embodiments of the present invention will be described in further detail.

[0063] 实施例I [0064] 如图I所示,本发明实施例提供了一种获取信息的方法,所述方法包括: [0063] Example I [0064] As shown in FIG I, the embodiment provides a method for acquiring information according to the present invention, the method comprising:

[0065] SlOl :获取用户输入的关键字; [0065] SlOl: Get keyword entered by the user;

[0066] S102 :根据预设的关键字匹配条件,获取与所述关键字内容匹配的第一信息集; [0066] S102: The default keyword matching condition, first information set and the content keyword match;

[0067] S103:当所述第一信息集包括至少两个语义类,且所述第一信息集中的信息数量大于预设信息数量时,获取至少两个语义类的预设信息数量的信息,并将所述信息发送给用户。 [0067] S103: When the first set of information comprising at least two semantic categories, and the amount of information of the first information set is greater than a preset number information, preset information acquiring number information of the at least two semantic classes, and sending said information to the user.

[0068] 需要说明的是,本发明实施例各步骤的执行主体可以为搜索服务器,也可以为具有个步骤功能的其他执行主体。 [0068] Incidentally, each step execution subject embodiment of the present invention may be a search server may perform other functions of the body has a step.

[0069] 本发明实施例,在与用户输入的关键字匹配的信息中,获取至少两个语义类的信息,从而为用户提供与其提供的关键字相关类型的信息,从而无需用户重新输入与该关键字相关的关键字,即可获取相关彳目息,减少了用户的操作,提闻了用户体验。 [0069] Example embodiments of the present invention, in the information input by the user keyword matching, obtaining information of at least two semantic categories, thereby providing keywords related types of information provided to its users, eliminating the need to re-enter the keywords related keywords, you can obtain relevant information left foot mesh, reducing the user's operation, to mention smell the user experience.

[0070] 实施例2 [0070] Example 2

[0071] 如图2所示,本发明实施例提供了一种获取信息的方法,所述方法包括: [0071] 2, the embodiment provides a method for obtaining information of the present invention, the method comprising:

[0072] S201 :获取用户输入的关键字; [0072] S201: Get keyword entered by the user;

[0073] 其中,用户输入的关键字可以是用户提问输入的问题、用户搜索输入的查询、或用户将要浏览的能反映其信息需求的已有问题。 [0073] where user-entered keywords can be a problem user input question, query, or search for user input to the user to browse existing problems reflect its information needs.

[0074] 例如,通过获取用户输入的问题,获取到用户输入的问题qi。 [0074] For example, the problem by obtaining user input, the user input is acquired problems qi.

[0075] S202 :根据预设的关键字匹配条件,获取与所述关键字内容匹配的第一信息集; [0075] S202: The default keyword matching condition, first information set and the content keyword match;

[0076] 可选的,可通过现有技术进行信息检索技术,从现有的问答系统以往收集和/或记录的问题信息数据库中检索出所有与用户问题Qi语义相关的问题。 [0076] Alternatively, the information may be retrieved by the prior art technique, the user retrieves all the problems associated with the semantic question Qi Q from conventional systems previously collected and / or recording information database problems.

[0077] 例如,通过在数据库中对问题Qi进行检索,得到相关问题候选集SQi = Isqtl, Sq1,sq2, • , sqj。 [0077] For example, by searching a database question Qi, the issues related to the candidate set obtained SQi = Isqtl, Sq1, sq2, •, sqj.

[0078] S203:获取所述第一信息集中的信息数量,判断所述信息数量是否大于预设信息数量,如果是,则执行S204,如果否,则将该问题获选集中的信息作为返回给用户的信息,即执行S206 ; [0078] S203: acquiring the number information of the first information set, determines the amount of information is greater than the preset number information, if yes, S204 is executed, and if not, then the selected questions to the information set as the return user information, i.e., performing S206;

[0079] 可选的,当S202中相关问题候选集SQi = {sq0, Sq1, sq2, •,sqj,m取值为20,且预设信息数量为10,即第一信息集中的信息数量大于预设信息数量,则执行S204。 [0079] Alternatively, when problems related to the candidate set in S202 SQi = {sq0, Sq1, sq2, •, sqj, m values ​​of 20, 10 and the preset number information, i.e. the number of information of the first information set is greater than preset number information, perform S204.

[0080] S204 :对所述第一信息集进行文本聚类; [0080] S204: the first set of text clustering information;

[0081] 其中,文本聚类主要是依据聚类假设,即同类的文档相似度较大,而不同类的文档相似度较小。 [0081] which is mainly based on text clustering clustering assumption that the same kind of document similarity was larger than without the same kind of document similarity.

[0082] 优选的,对搜索引擎返回的结果进行聚类,使用户迅速定位到所需要的信息。 [0082] Preferably, the search engine returns results of clustering, allowing users to quickly locate desired information. 具体的,通过用户输入检索关键词,而后对检索到的文档进行聚类处理,并输出各个不同类别的简要描述,从而可以缩小检索的范围,用户只需关注比较有希望的主题。 Specifically, the user inputs a search keyword, then the retrieved document clustering, and outputs a brief description of the various categories, which can narrow the scope of the search, the user can focus on more promising theme. 另外这种方法也可以为用户二次检索提供线索。 Also, such a method may also provide clues to the second user to retrieve.

[0083] 可选的,对所述第一信息集进行文本聚类的算法课包括:划分法(partitioningmethods)、层次法(hierarchical methods)、基于密度的方法(density-based methods)、基于网格的方法(grid-based methods)和基于模型的方法(model-based methods)。 [0083] Optionally, the first set of information text clustering algorithm class comprises: partitioning method (partitioningmethods), AHP (hierarchical methods), methods density (density-based methods) based on a grid method (grid-based methods) and model-based method (model-based methods).

[0084] 其中,划分法(partitioning methods)是指:给定一个有N个元组或者纪录的数据集,分裂法将构造K个分组,每一个分组就代表一个聚类,K < N。 [0084] wherein partitioning method (partitioning methods) means: Given a record or tuple of N data sets, splitting method configured K packets, each packet represents one cluster, K <N. 而且这K个分组满足下列条件:(I)每一个分组至少包含一个数据纪录;(2)每一个数据纪录属于且仅属于一个分组(注意:这个要求在某些模糊聚类算法中可以放宽);对于给定的K,算法首先给出一个初始的分组方法,以后通过反复迭代的方法改变分组,使得每一次改进之后的分组方案都较前一次好,而所谓好的标准就是:同一分组中的记录越近越好,而不同分组中的纪录越远越好。 K packets and it satisfies the following conditions: (the I) each data packet comprising at least one record; (2) belonging to each data record, and belongs to only one group (note: this requirement in some fuzzy clustering algorithm may be relaxed) ; for a given K, the algorithm is first given an initial grouping method, after the change in the packet by the method of iterative, such that each grouping scheme after modifications are a better than before, but the standard is called good: the same group the closer the better record, and the record of different groups in the farther the better. 使用这个基本思想的算法有=K-MEANS算法、K-MED0IDS算法、CLARANS算法。 Using this basic idea of ​​the algorithm = K-MEANS algorithm, K-MED0IDS algorithm, CLARANS algorithm. [0085] 层次法(hierarchical methods)是指:对给定的数据集进行层次似的分解,直到某种条件满足为止。 [0085] AHP (hierarchical methods) means: for a given data set similar decomposition hierarchy, until certain conditions are met. 具体又可分为“自底向上”和“自顶向下”两种方案。 DETAILED can be divided into "bottom-up" and "top down" two programs. 例如在“自底向上”方案中,初始时每一个数据纪录都组成一个单独的组,在接下来的迭代中,它把那些相互邻近的组合并成一个组,直到所有的记录组成一个分组或者某个条件满足为止。 For example, in "bottom-up" program, initially each data record are composed of a single group, the next iteration, which those adjacent to each other and combined into a group, until all the composition of a packet or record certain conditions are met. 代表算法有:BIRCH算法、CURE算法、CHAMELEON算法等。 Representatives algorithm: BIRCH algorithm, CURE algorithm, CHAMELEON algorithm.

[0086] 基于密度的方法(density-based methods)是指:基于密度的方法与其它方法的一个根本区别是:它不是基于各种各样的距离的,而是基于密度的。 [0086] The method of density (density-based methods) are based on means: a density-based methods and other methods are fundamental difference: it is not based on a variety of distance, but on the density. 这样就能克服基于距离的算法只能发现“类圆形”的聚类的缺点。 This will only be found to overcome the disadvantages of clustering "round" distance-based algorithm. 这个方法的指导思想就是,只要一个区域中的点的密度大过某个阀值,就把它加到与之相近的聚类中去。 Guiding principle of this method is that as long as the density of the dots in a region larger than a certain threshold, put it to go with a similar cluster. 代表算法有=DBSCAN算法、OPTICS算法、DENCLUE算法等。 Representatives algorithm = DBSCAN algorithm, OPTICS algorithm, DENCLUE algorithm.

[0087] 基于网格的方法(grid-based methods)是指:首先将数据空间划分成为有限个单元(cell)的网格结构,所有的处理都是以单个的单元为对象的。 [0087] The method of the grid (grid-based methods) are based on means: First, the space is divided into a limited number of data units (cell) of the grid structure, all the processing units are of a single object. 这么处理的一个突出的优点就是处理速度很快,通常这是与目标数据库中记录的个数无关的,它只与把数据空间分为多少个单元有关。 A prominent advantage of such a process is to deal very quickly, usually this is the record number of the target database independent, and only the data space is divided into a number of related units. 代表算法有=STING算法、CLIQUE算法、WAVE-CLUSTER算法。 Representatives algorithm = STING algorithm, CLIQUE algorithm, WAVE-CLUSTER algorithm.

[0088] 基于模型的方法(model-based methods)是指:基于模型的方法给每一个聚类假定一个模型,然后去寻找能个很好的满足这个模型的数据集。 [0088] The model-based method (model-based methods) means: to each cluster model assumes a model-based method, and can find a good to satisfy the model data set. 这样一个模型可能是数据点在空间中的密度分布函数或者其它。 Such a model may be the data density distribution function, or other points in space. 它的一个潜在的假定就是:目标数据集是由一系列的概率分布所决定的。 It's a potential assumption is: the target data set is determined by a series of probability distributions. 通常有两种尝试方向:统计的方案和神经网络的方案。 There are usually two attempts direction: statistical programs and neural network program.

[0089] 本步骤中还可通过其他算法实现对第一信息集中的数据进行聚类,本实施例并不限定。 [0089] In this step, the first information may be clustered data set is achieved by other algorithms, for example, the present embodiment is not limited.

[0090] S205 :获取所述第一信息集包含的语义类的数量; [0090] S205: acquiring the number of semantic classes of the first set of information comprising;

[0091] 例如,对相关问题候选集SQi = {sq0, sq1; sq2, •,sqj,m取值为20的第一信息候选集按其语义类进行聚类,得到3个语义类。 [0091] For example, a candidate set of questions SQi = {sq0, sq1; sq2, •, sqj, m value of a first set of candidate information 20 according to their semantic class clustering give 3 semantic categories.

[0092] S206 :判断所述语义类的数量是否大于或等于两个,如果大于,则获取预设信息数量的信息,所述信息至少为两个语义类。 [0092] S206: determining the semantics of a class number is greater than or equal to two, if greater than the preset information is acquired amount of information, the information at least two semantic categories.

[0093] 例如,如S205中示例所示,该第一信息集的语言类为3类,大于两个语义类,则获取预设信息数量的信息,所述信息至少为两个语义类。 [0093] For example, as in S205 illustrated example, the language type of the first set of information into three types, semantic than two classes, the number of preset information is acquired information, the semantic information of at least two classes.

[0094] S207 :对所述信息按其与所述关键字的匹配度由高到低进行排序; [0094] S207: the matching degree according to their information to the keyword sorted in descending order;

[0095] S208 :将排序后的信息,按顺序一次发送给用户。 [0095] S208: The information is sorted, in order once sent to the user.

[0096] 需要说明的是,本发明实施例各步骤的执行主体可以为搜索服务器,也可以为具有个步骤功能的其他执行主体。 [0096] Incidentally, each step execution subject embodiment of the present invention may be a search server may perform other functions of the body has a step.

[0097] 本发明实施例,在与用户输入的关键字匹配的信息中,获取至少两个语义类的信息,从而为用户提供与其提供的关键字相关类型的信息,从而无需用户重新输入与该关键字相关的关键字,即可获取相关彳目息,减少了用户的操作,提闻了用户体验。 [0097] Example embodiments of the present invention, in the information input by the user keyword matching, obtaining information of at least two semantic categories, thereby providing keywords related types of information provided to its users, eliminating the need to re-enter the keywords related keywords, you can obtain relevant information left foot mesh, reducing the user's operation, to mention smell the user experience.

[0098] 实施例3 [0099] 如图3所示,本发明实施例提供了一种获取信息的方法,所述方法包括步骤S301〜S310,其中S301〜S305与实施例2中的S201〜S205相同,此处不再赘述,与实施例2不同的是,本实施里还包括以下步骤: [0098] Example 3 [0099] As shown, the embodiment provides a method for acquiring information according to the present invention, the method includes the step S301~S310 3, S201~S205 in Example 2 wherein S301~S305 the same is not repeated here in Example 2 except that, in the present embodiment further comprises the step of:

[0100] S306 :判断所述第一信息集包含的语义类的数量是否大于所述预设信息数量,如果大于,则在每个语义类包含的信息中获取一个信息,得到第一临时信息集; [0100] S306: determining the number of semantic classes whether the first set of information comprises information greater than the preset number, if yes, obtaining information on a class of information included in each semantic obtain a first set of temporary information ;

[0101] 例如,相关问题候选集SQi = {sq0, sq1; sq2, • , sqj,m取值为20中包含的语义类的数量为3,预设信息数量为10 ;则在每一个语义类的中获取一个信息,此处得到3个不同语义类的信息,组成第一临时信息集LQl = {lql0, Iql1, lql2}。 [0101] For example, problems related to the candidate set SQi = {sq0, sq1; sq2, •, sqj, m is the number of values ​​in the semantics of a class included 20 3, preset number information 10; each semantic class are obtaining an information of where to obtain three different semantics of a class of information, consisting of a first set of temporary information LQl = {lql0, Iql1, lql2}.

[0102] S307 :计算所述预设信息数量与所述语义类的数量的差值数; [0102] S307: calculating the difference between the number of preset number with the semantic information on the number of classes;

[0103] 例如,在S306获取到3个信息后,则计算所述预设信息数量与所述语义类的数量的差值数,即预设信息数量10减去3,差值数为7。 [0103] For example, after obtaining the three pieces of information in S306, the calculated difference number of the preset number of the semantic information of the number of classes, i.e., preset number information 10 is subtracted 3, the number 7 for the difference.

[0104] S308 :对所述第一信息集中剩余的信息按其与所述关键字的匹配度由高到低进行排序; [0104] S308: information of the first information set for the remaining keywords of the degree of matching their sorted in descending order;

[0105] 例如,相关问题候选集SQi = {sq0, sq1; sq2, •,sqj, m取值为20中,出去已获取的3个信息,还有17个信息,对该17个信息按其与关键字的匹配度由高到低进行排序。 [0105] For example, problems related to the candidate set SQi = {sq0, sq1; sq2, •, sqj, m value of 20, three out of the acquired information, there are 17 message, the message to be 17 and matching keyword in descending order.

[0106] S309:获取排序后信息位置序号小于或等于所述差值数的信息,得到第二临时信息集,并将所述第一临时信息集和所述第二临时信息集合并; [0106] S309: After ordering information acquiring position number less than or equal to the number of difference information to obtain a second set of temporary information and the first set and the second temporary information collection and temporary information;

[0107] 例如,排序后的剩余信息序号为I〜17,则获取位置序号等于或小于差值数7的信息,即信息序号为I〜7的信息,从而得到第二临时信息集LQ2 = {lq20, lq2p lq22,lq23,lq24,lq25},并将第一临时信息集与第二临时信息集合并。 [0107] For example, the remaining number information is sorted I~17, position number is acquired difference information is equal to or less than the number 7, i.e., information numbers of the information I~7 to obtain the second set of temporary information LQ2 = { lq20, lq2p lq22, lq23, lq24, lq25}, and the set of the first and second temporary information collection and temporary information.

[0108] S310 :将合并后的信息发送给用户。 [0108] S310: the combined information to the user.

[0109] 例如,将第一临时信息集与第二临时信息集合并得到信息Iqltl, Iql1, Iql2, lq20,lq2i,lq22, lq23,lq24,lq25,,将该信息发送给用户。 [0109] For example, the first set of the temporary information and the second information set to obtain information temporary Iqltl, Iql1, Iql2, lq20, lq2i, lq22, lq23, lq24, lq25 ,, transmits the information to the user.

[0110] 优选的,还可对该信息按其与关键字的匹配度由高到低进行排序,将排序后的信息,按顺序发送给用户。 [0110] Preferably, the information also according to their degree of matching keywords sorted in descending order, the sort of information, is sent to the user sequentially.

[0111] 需要说明的是本实施例仅为一种获取不同语义类的信息获取方法,具体的获取到不同语义类的信息还可通过多种方法实现,以实现获取到的信息属于不同语义类为目的采用的方法都属于本实施例保护的范围,具体不再赘述。 [0111] It should be noted that the present embodiment is only different semantic class for obtaining information acquisition method, the specific information acquired by a different semantic category may also be accomplished in many ways, in order to achieve access to the semantic information belonging to different classes the method employed for the purposes of fall protection scope of the present embodiment, which will not be repeated.

[0112] 需要说明的是,本发明实施例各步骤的执行主体可以为搜索服务器,也可以为具有个步骤功能的其他执行主体。 [0112] Incidentally, each step execution subject embodiment of the present invention may be a search server may perform other functions of the body has a step. [0113] 本发明实施例,在与用户输入的关键字匹配的信息中,获取至少两个语义类的信息,从而为用户提供与其提供的关键字相关类型的信息,从而无需用户重新输入与该关键字相关的关键字,即可获取相关彳目息,减少了用户的操作,提闻了用户体验。 [0113] Example embodiments of the present invention, in the information input by the user keyword matching, obtaining information of at least two semantic categories, thereby providing keywords related types of information provided to its users, eliminating the need to re-enter the keywords related keywords, you can obtain relevant information left foot mesh, reducing the user's operation, to mention smell the user experience.

[0114] 实施例4 [0114] Example 4

[0115] 如图4所示,本发明实施例提供了一种获取信息的方法,所述方法包括: [0115] As shown in FIG 4, the embodiment of the present invention provides a method for acquiring information, the method comprising:

[0116] S401 :获取用户输入的关键字; [0116] S401: Get keyword entered by the user;

[0117] 其中,用户输入的关键字可以是用户提问输入的问题、用户搜索输入的查询、或用户将要浏览的能反映其信息需求的已有问题。 [0117] where user-entered keywords can be a problem user input question, query, or search for user input to the user to browse existing problems reflect its information needs.

[0118] 例如,通过获取用户输入的问题,获取到用户输入的问题qi。 [0118] For example, the problem by obtaining user input, the user input is acquired problems qi. [0119] S402 :根据预设的关键字匹配条件,获取与所述关键字内容匹配的第一信息集; [0119] S402: The default keyword matching condition, first information set and the content keyword match;

[0120] 可选的,可通过现有技术进行信息检索技术,从现有的问答系统以往收集和/或记录的问题信息数据库中检索出所有与用户问题Qi语义相关的问题。 [0120] Alternatively, the information may be retrieved by the prior art technique, the user retrieves all the problems associated with the semantic question Qi Q from conventional systems previously collected and / or recording information database problems.

[0121] S403:获取所述第一信息集中的信息数量,判断所述信息数量是否大于预设信息数量,如果大于,则执行S404,如果小于,则执行S405 ; [0121] S403: acquiring the number information of the first information set, determines the amount of information is greater than the preset number information, if yes, S404, executed, is less than, S405 is performed;

[0122] 本实施例中,可选的,当所述信息数量大于预设信息数量时,可对所述第一信息集中的信息按其与所述关键字的匹配度由低到高进行排序后执行S404。 [0122] In this embodiment, optionally, when the amount of information is greater than the preset number information, the first information may be set according to their information to the keyword matching degree sorted from low to high after executing S404.

[0123] 例如,通过现有技术进行信息检索技术,从现有的问答系统以往收集和/或记录的问题信息数据库中检索出所有与用户问题Qi语义相关的问题后,根据它们与问题Qi的相似度进行排序得到相关问题候选集SQi = {sq0, sq1; sq2, •,sqj。 Problems Information Database [0123] For example, prior art information retrieval techniques, previously collected and / or from the existing record in question answering system retrieves all the user questions Qi and semantically related problems, according to which the problems of Qi Related issues are sorted similarity obtained candidate set SQi = {sq0, sq1; sq2, •, sqj.

[0124] S404 :当所述第一信息集为SQ = {sq0, sq1; sq2, •,sqj, m为所述第一信息集中的信息数;则根据rqx = sqy获取至少两个语义类的预设信息数量的信息。 [0124] S404: When the first set of information is SQ = {sq0, sq1; sq2, •, sqj, m is the number of the first information set; rqx = sqy is acquired according to the at least two semantic classes information preset amount of information.

[0125] 其中,少=卜a_|,a = logNm, N为预设信息数量,rqx为按y =卜a_!'在SQ中获取后的信息。 [0125] wherein Bu = less a_ |, a = logNm, N is a preset number information, rqx information to the 'SQ acquired in the press Bu y = a_!.

[0126] 具体的,从SQi中取出N个语义逐步发散的相关问题。 [0126] In particular, the issues related to remove N semantic gradually diverging from the SQi. 令^ = m,即a = logNm,,取 Order ^ = m, i.e., a = logNm ,, taken

函数} = ,则sqy即为第X个相关问题rqx,从而得到获取后的信息集RQi = {rqi,rq2, •}。 } = Function, that is, the X-th sqy issues rqx, whereby the information set acquired RQi = {rqi, rq2, •}.

X到I是一个逐步发散的非线性映射,这样既能保证优先输出序列SQi中与qi最相关的查询,也能保证SQi后面的语义相关但发散问题能输出到相关问题中。 X I is a nonlinear mapping to a gradually diverging, so that the output sequence can ensure priority SQi most relevant query Qi, the latter can be ensured SQi but semantically related to the divergence of the output can be related problems.

[0127] 可选的,还可对SQi进行排序后,构造映射函数y = f(x),(f(N) Sm),令rqx =sqy,从而获得关键问题RQi = {rqi,rq2, •,rqN}。 [0127] Optionally, also the sorting of SQi configured mapping function y = f (x), (f (N) Sm), so rqx = sqy, thereby obtaining the key issues RQi = {rqi, rq2, • , rqN}. 各种合适的映射函数f (x)均可以用于解决该问题,如幂函数、指数函数等。 Various suitable mapping function f (x) can be used to solve this problem, such as power function, exponential function.

[0128] S405 :将所述信息发送给用户。 [0128] S405: transmitting the information to the user.

[0129] 可选的,输出问题qi的相关问题RQi = Irq^rq2, •},在问题浏览页面向用户依次展示各个相关问题。 [0129] Alternatively, the output of issues related to the question qi RQi = Irq ^ rq2, •}, browse the page in question show various issues related to the user in order.

[0130] 需要说明的是,本发明实施例各步骤的执行主体可以为搜索服务器,也可以为具有个步骤功能的其他执行主体。 [0130] Incidentally, each step execution subject embodiment of the present invention may be a search server may perform other functions of the body has a step.

[0131] 本发明实施例,在与用户输入的关键字匹配的信息中,获取至少两个语义类的信息,从而为用户提供与其提供的关键字相关类型的信息,从而无需用户重新输入与该关键字相关的关键字,即可获取相关彳目息,减少了用户的操作,提闻了用户体验。 [0131] Example embodiments of the present invention, in the information input by the user keyword matching, obtaining information of at least two semantic categories, thereby providing keywords related types of information provided to its users, eliminating the need to re-enter the keywords related keywords, you can obtain relevant information left foot mesh, reducing the user's operation, to mention smell the user experience. [0132] 实施例5 [0132] Example 5

[0133] 如图5所示,本发明实施例提供了一种获取信息的装置,所述装置包括:关键字获取模块501、第一信息集获取模块502、信息获取模块503和信息发送模块504,其中: [0133] As shown in FIG 5, the embodiment provides an apparatus for acquiring information of the present invention, the apparatus comprising: a key obtaining module 501, a first set of information obtaining module 502, the information obtaining module 503 and the information sending module 504 ,among them:

[0134] 关键字获取模块501,用于获取用户输入的关键字; [0134] keyword obtaining module 501, configured to obtain a keyword input by a user;

[0135] 第一信息集获取模块502,用于根据预设的关键字匹配条件,获取与所述关键字内容匹配的第一信息集; [0135] first set of information obtaining module 502, according to a preset keyword matching condition, first information set and the content keyword match;

[0136] 信息获取模块503,用于判断所述第一信息集中的信息数量是否大于预设信息数量,且所述第一信息集是否包括至少两个语义类,如果是,则获取所述预设信息数量的信息,所述信息包括至少两个语义类; [0137] 信息发送模块504,用于发送所述信息至所述用户。 [0136] information acquiring module 503, the amount of information for determining the first information set is greater than the preset number information, and if the first set of information comprising at least two semantic categories, if so, obtaining the pre- provided information amount of information, the semantic information comprises at least two classes; [0137] information sending module 504, configured to send the information to the user.

[0138] 本发明实施例,在与用户输入的关键字匹配的信息中,获取至少两个语义类的信息,从而为用户提供与其提供的关键字相关类型的信息,从而无需用户重新输入与该关键字相关的关键字,即可获取相关イ目息,减少了用户的操作,提闻了用户体验。 [0138] Example embodiments of the present invention, in the information input by the user keyword matching, obtaining information of at least two semantic categories, thereby providing keywords related types of information provided to its users, eliminating the need to re-enter the keywords related keywords, you can obtain relevant information イ mesh, reducing the user's operation, to mention smell the user experience.

[0139] 实施例6 [0139] Example 6

[0140] 如图6所示,本发明实施例提供了一种获取信息的装置,与实施例5相似,所述装置包括关键字获取模块501、第一信息集获取模块502、信息获取模块503和信息发送模块504。 As shown in [0140] Figure 6, embodiments provide an apparatus for acquiring information of the present invention, similar to Example 5, the apparatus comprises a key obtaining module 501, a first set of information obtaining module 502, the information obtaining module 503 and the information sending module 504.

[0141] 进ー步所述信息获取模块503具体包括: [0141] ー into the information acquiring module 503 further comprises:

[0142] 信息数量确定单元5031,用于获取所述第一信息集中的信息数量,判断所述信息数量是否大于预设信息数量,如果大于,则所述第一信息集中的信息数量大于预设信息数量; [0142] The number of information determining unit 5031, configured to acquire number information of the first information set, determines the amount of information is greater than the number of preset information, if yes, the number of the first set higher than the preset information the amount of information;

[0143] 文本聚类单元5032,用于对所述第一信息集中的信息按语义类进行文本聚类; [0143] Text clustering unit 5032, information for the first set of text information semantically based clusters;

[0144] 语义类数量获取单元5033,用于获取所述第一信息集包含的语义类的数量; [0144] The number of semantic category acquisition unit 5033, the number of semantic category comprises the first set of information for acquiring;

[0145] 语义类确定单元5034,用于判断所述语义类的数量是否大于或等于两个,如果大干,则所述第一信息集包括至少两个语义类。 [0145] semantic class determination unit 5034 for determining the semantics of a class number is greater than or equal to two, if big, then the first set of information comprising at least two semantic categories.

[0146] 信息获取单元5035,用于当所述第一信息集中的信息数量大于预设信息数量,且所述第一信息集包括至少两个语义类吋,获取所述预设信息数量的信息,所述信息包括至少两个语义类。 [0146] The information acquisition unit 5035, configured to, when said number information of the first information set is greater than a preset number information, and the first set of information comprising at least two semantic categories inch, obtaining information about the number of the preset information the semantic information includes at least two classes.

[0147] 其中,所述信息发送模块504,具体包括: [0147] wherein the information sending module 504, comprises:

[0148] 关键字排序单元5041,用于对所述信息按其与所述关键字的匹配度由高到低进行排序; [0148] Keyword sorting unit 5041, the information for the degree of matching its key sorted in descending order;

[0149] 信息发送单元5042,用于将排序后的信息,按顺序依次发送给用户。 [0149] information transmitting unit 5042 for communicating information sorted sequentially in the order sent to the user.

[0150] 本发明实施例,在与用户输入的关键字匹配的信息中,获取至少两个语义类的信息,从而为用户提供与其提供的关键字相关类型的信息,从而无需用户重新输入与该关键字相关的关键字,即可获取相关イ目息,减少了用户的操作,提闻了用户体验。 [0150] Example embodiments of the present invention, in the information input by the user keyword matching, obtaining information of at least two semantic categories, thereby providing keywords related types of information provided to its users, eliminating the need to re-enter the keywords related keywords, you can obtain relevant information イ mesh, reducing the user's operation, to mention smell the user experience.

[0151] 实施例7 [0151] Example 7

[0152] 如图7所示,本发明实施例提供了一种获取信息的装置,与实施例6相似,所述装置包括:关键字获取模块501、第一信息集获取模块502、信息获取模块503和信息发送模块504。 [0152] 7, the embodiment provides an apparatus for acquiring information of the present invention, similar to Example 6, the apparatus comprising: a key obtaining module 501, a first set of information obtaining module 502, an information obtaining module 503 and the information sending module 504. 其中,信息发送模块504包括:关键字排序单元5041和信息发送单元5042,与实施例6不同的是,本实施例中,信息获取模块503具体包括: Wherein the information sending module 504 comprising: a sorting unit 5041 and the keyword information transmission unit 5042, unlike Example 6 of the present embodiment, the information obtaining module 503 specifically comprises:

[0153] 临时信息集生成単元5036,用于当所述第一信息集包含的语义类的数量小于所述预设信息数量吋,则在每个语义类包含的信息中获取ー个信息,得到第一临时信息集; [0153] radiolabeling temporary information set generating element 5036, when the number of semantic class for the first set of information comprises information is less than the preset number of inches, a message is acquired ー information included in each semantic class, to give The first set of temporary information;

[0154] 数量差值数计算单元5037,用于计算所述预设信息数量与所述语义类的数量的差值数; [0154] Number of Number of difference value calculating unit 5037, for the difference between the number of amount of information and the semantic class number calculating said preset;

[0155] 预设信息获取单元5038,用于对所述第一信息集中剰余的信息按其与所述关键字的匹配度由高到低进行排序,获取排序后信息位置序号小于或等于所述差值数的信息,得到第二临时信息集,并将所述第一临时信息集和所述第二临时信息集合井,得到所述预设信息数量的信息; [0155] The preset information acquisition unit 5038, the information for the first information set for Surplus I according to their degree of matching with the keywords sorted in descending order, obtaining position information of a sorting SN of less than or equal the number of difference information to obtain a second set of temporary information and the first information set and the second temporary information set temporary well, to obtain information about the number of the preset information;

[0156] 第一信息获取单元5039,用于当所述第一信息集包含的语义类的数量大于所述预设信息数量吋,则在每个语义类包含的信息中获取ー个信息,得到第四临时信息集,对所述第四临时信息集中的信息按其与所述关键字的匹配度由高到低进行排序,获取排序后第四临时信息集中信息位置序号小于或等于所述预设信息数量的信息,得到所述预设信息数量的信息。 [0156] The first information acquiring unit 5039, the number of semantic classes when the first set of information contained in inches greater than the preset number information, information is acquired in a ー information included in each semantic class, to give the fourth set of temporary information, the fourth information set its temporary information matching the keyword is sorted in descending order, the fourth temporary information set sorting information after obtaining the position number less than or equal to the pre- provided information amount of information, to obtain information of the preset number information.

[0157] 本发明实施例,在与用户输入的关键字匹配的信息中,获取至少两个语义类的信息,从而为用户提供与其提供的关键字相关类型的信息,从而无需用户重新输入与该关键字相关的关键字,即可获取相关イ目息,减少了用户的操作,提闻了用户体验。 [0157] Example embodiments of the present invention, in the information input by the user keyword matching, obtaining information of at least two semantic categories, thereby providing keywords related types of information provided to its users, eliminating the need to re-enter the keywords related keywords, you can obtain relevant information イ mesh, reducing the user's operation, to mention smell the user experience.

[0158] 实施例8 [0158] Example 8

[0159] 如图8所示,本发明实施例提供了一种获取信息的装置,与实施例6相似,所述装置包括:关键字获取模块501、第一信息集获取模块502、信息获取模块503和信息发送模块504。 [0159] As shown in FIG 8, the present invention provides an apparatus for obtaining information, similar to Embodiment 6, the apparatus comprising: a key obtaining module 501, a first set of information obtaining module 502, an information obtaining module 503 and the information sending module 504. 其中,信息发送模块504包括:关键字排序单元5041和信息发送单元5042,与实施例6不同的是,本实施例中,信息获取模块503具体包括: Wherein the information sending module 504 comprising: a sorting unit 5041 and the keyword information transmission unit 5042, unlike Example 6 of the present embodiment, the information obtaining module 503 specifically comprises:

[0160] 第二信息获取单元50310,用于对所述第一信息集中的信息按其与所述关键字的匹配度由低到高进行排序,当所述第一信息集为SQ = {sq0, sq1; sq2, · , sqm}, m为所述第一信息集中的信息数时,则根据rqx = sqy获取至少两个语义类的预设信息数量的信息;其中,_y = |_xa_|,a = logNm, N为预设信息数量,rqx为按= 在SQ中获取后的信息。 [0160] The second information acquiring unit 50310, for the first information message to be set to the keyword matching degree is sorted from low to high, when the first set of information is SQ = {sq0 , sq1; sq2, ·,}, m is the number of the information of the first information set, then according to preset information rqx = sqy obtain information on the number of the at least two semantic categories of SQM; wherein, _y = | _xa_ |, a = logNm, N is a preset number information, rqx according to the information acquired in = the SQ.

[0161] 本发明实施例,在与用户输入的关键字匹配的信息中,获取至少两个语义类的信息,从而为用户提供与其提供的关键字相关类型的信息,从而无需用户重新输入与该关键字相关的关键字,即可获取相关イ目息,减少了用户的操作,提闻了用户体验。 [0161] Example embodiments of the present invention, in the information input by the user keyword matching, obtaining information of at least two semantic categories, thereby providing keywords related types of information provided to its users, eliminating the need to re-enter the keywords related keywords, you can obtain relevant information イ mesh, reducing the user's operation, to mention smell the user experience.

[0162]以上实施例提供的技术方案中的全部或部分内容可以通过软件编程实现,其软件程序存储在可读取的存储介质中,存储介质例如:计算机中的硬盘、光盘或软盘。 [0162] All of the above technical solutions provided in the embodiments, or part may be implemented by software programming, in the storage medium may be readable, for example, a storage medium storing a software program which: a computer hard disk, optical disk or floppy disk.

[0163] 以上所述仅为本发明的较佳实施例,并不用以限制本发明,凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。 [0163] The foregoing is only preferred embodiments of the present invention, not intended to limit the present invention within the spirit and principle of the present invention, any modification, equivalent replacement, or improvement, it should be included in the present within the scope of the invention.

Claims (10)

  1. 1. 一种获取信息的方法,其特征在于,所述方法包括: 获取用户输入的关键字; 根据预设的关键字匹配条件,获取与所述关键字内容匹配的第一信息集; 判断所述第一信息集中的信息数量是否大于预设信息数量,且所述第一信息集是否包括至少两个语义类,如果是,则获取所述预设信息数量的信息,所述信息包括至少两个语义类; 发送所述信息至所述用户。 1. A method of obtaining information, characterized in that, said method comprising: obtaining a keyword input by a user; keyword matching according to a preset condition, the first set acquired keyword information matches the content; Analyzing the said first amount of information is larger than information set preset number information, and if the first set of information comprising at least two semantic categories, if so, obtaining information about the number of the preset information, said information including at least two semantic category; transmitting the information to the user.
  2. 2.根据权利要求I所述的方法,其特征在于,所述判断所述第一信息集中的信息数量是否大于预设信息数量,且所述第一信息集是否包括至少两个语义类,具体包括: 获取所述第一信息集中的信息数量,判断所述信息数量是否大于预设信息数量; 对所述第一信息集中的信息按语义类进行文本聚类; 获取所述第一信息集包含的语义类的数量; 判断所述语义类的数量是否大于或等于两个。 2. The method as claimed in claim I, wherein said determining the number information of the first information set is greater than the preset number information, and if the first set of information comprising at least two semantic categories, specifically comprising: acquiring the number information of the first information set, determines the amount of information is greater than a preset number information; information for the first set of text clustering information semantically based; obtaining the first set of information comprising the semantics of a class number; determining the semantics of a class number is greater than or equal to two.
  3. 3.根据权利要求I所述的方法,其特征在于,所述获取预设信息数量的信息,所述信息包括至少两个语义类具体包括: 当所述第一信息集包含的语义类的数量小于所述预设信息数量时,则在每个语义类包含的信息中获取一个信息,得到第一临时信息集; 计算所述预设信息数量与所述语义类的数量的差值数; 对所述第一信息集中剩余的信息按其与所述关键字的匹配度由高到低进行排序;获取排序后信息位置序号小于或等于所述差值数的信息,得到第二临时信息集,并将所述第一临时信息集和所述第二临时信息集合并,得到所述预设信息数量的信息; 当所述第一信息集包含的语义类的数量大于所述预设信息数量时,则在每个语义类包含的信息中获取一个信息,得到第四临时信息集; 对所述第四临时信息集中的信息按其与所述关键字的匹配度由高到低 3. The method as claimed in claim I, wherein the obtaining preset information amount of information, the semantic information comprises at least two classes further comprises: a semantic class number of the first set of information comprises when is smaller than the preset number information is acquired at each semantic class information included in a message, a first provisional information set; calculating a difference of the number of preset number with the semantic information on the number of classes; for the first information set its remaining information to the keyword matching degree sorted in descending order; obtaining position information sorting SN of less than or equal to the number of difference information to obtain a set of second temporary information, when the number of the semantics of a class of the first set of information comprises information greater than the preset number; temporary information and the first set and the second set of temporary information and to give information about the number of the preset information , a message is acquired in the information contained in each semantic class, to obtain a fourth set of temporary information; the fourth set of temporary information to the keyword matching its descending 行排序; 获取排序后第四临时信息集中信息位置序号小于或等于所述预设信息数量的信息,得到所述预设信息数量的信息。 Line sorting; After obtaining the information of the sort of the fourth temporary information set position number is less than or equal to the preset number information is information, obtain information about the number of the preset information.
  4. 4.根据权利要求I所述的方法,其特征在于,所述获取预设信息数量的信息,所述信息包括至少两个语义类具体包括: 对所述第一信息集中的信息按其与所述关键字的匹配度由低到高进行排序; 当所述第一信息集为SQ = {sq0, sq1; sq2, •,sqj,m为所述第一信息集中的信息数; 则根据rqx = sqy获取至少两个语义类的预设信息数量的信息; 其中,_y =卜,a = !(^!!^为预设信息数量'叫为按产卜+在SQ中获取后的信息。 4. The method as claimed in claim I, wherein the number of the preset information acquiring information that includes at least two semantic categories comprises: the first information set and the message to be said matching keywords are sorted from low to high; and when the first set of information is SQ = {sq0, sq1; sq2, •, sqj, m is a number from information of the first information set; according to the rqx = sqy obtaining at least two predetermined classes of semantic information about the number of information; wherein, _y = Bu, a = (^ !! ^ preset number information to 'call information is acquired in SQ + press to yield Bu!.
  5. 5.根据权利要求I所述的方法,其特征在于,所述将所述信息发送给用户具体包括: 对所述信息按其与所述关键字的匹配度由高到低进行排序; 将排序后的信息,按顺序依次发送给用户。 The method according to claim I, wherein said transmitting said information to a user comprises: the information according to their degree of matching with the keywords sorted in descending order; sort after the information is sequentially transmitted to the user sequentially.
  6. 6. 一种获取信息的装置,其特征在于,所述装置包括: 关键字获取模块,用于获取用户输入的关键字; 第一信息集获取模块,用于根据预设的关键字匹配条件,获取与所述关键字内容匹配的第一信息集; 信息获取模块,用于判断所述第一信息集中的信息数量是否大于预设信息数量,且所述第一信息集是否包括至少两个语义类,如果是,则获取所述预设信息数量的信息,所述信息包括至少两个语义类; 信息发送模块,用于发送所述信息至所述用户。 An information acquisition apparatus, wherein, said means comprising: a key obtaining module, configured to obtain a keyword input by a user; a first set of information obtaining module, according to a preset keyword matching condition, obtaining said first set of information with content that matches the keyword; information obtaining module, configured to determine the amount of information of the first information set is greater than the preset number information, and if the first set of information comprising at least two semantic type, if yes, obtaining information about the number of the preset information, the semantic information comprises at least two classes; information sending module, configured to send the information to the user.
  7. 7.根据权利要求6所述的装置,其特征在于,所述信息获取模块,具体包括: 信息数量确定单元,用于获取所述第一信息集中的信息数量,判断所述信息数量是否大于预设信息数量; 文本聚类单元,用于对所述第一信息集中的信息按语义类进行文本聚类; 语义类数量获取单元,用于获取所述第一信息集包含的语义类的数量; 语义类确定单元,用于判断所述语义类的数量是否大于或等于两个; 信息获取单元,用于当所述第一信息集中的信息数量大于预设信息数量,且所述第一信息集包括至少两个语义类时,获取所述预设信息数量的信息,所述信息包括至少两个语义类。 7. The device according to claim 6, wherein the information acquiring module comprises: determining the number of information unit for acquiring the number information of the first information set, determines the amount of information is greater than a predetermined the amount of information provided; text clustering unit, information for the first set of text information semantically based clusters; number semantic category acquisition unit for acquiring the number of the semantics of a class of the first set of information comprising; semantic class determining means for determining the semantics of a class number is greater than or equal to two; information acquisition unit, when the amount of information for the first information set is greater than a preset number information, and the first set of information comprises at least two semantic categories when acquiring the predetermined information amount of information, the semantic information comprises at least two classes.
  8. 8.根据权利要求6所述的装置,其特征在于,所述信息获取模块,具体包括: 临时信息集生成单元,用于当所述第一信息集包含的语义类的数量小于所述预设信息数量时,则在每个语义类包含的信息中获取一个信息,得到第一临时信息集;数量差值数计算单元,用于计算所述预设信息数量与所述语义类的数量的差值数;预设信息获取单元,用于对所述第一信息集中剩余的信息按其与所述关键字的匹配度由高到低进行排序,获取排序后信息位置序号小于或等于所述差值数的信息,得到第二临时信息集,并将所述第一临时信息集和所述第二临时信息集合并,得到所述预设信息数量的信息; 第一信息获取单元,用于当所述第一信息集包含的语义类的数量大于所述预设信息数量时,则在每个语义类包含的信息中获取一个信息,得到第四临时信息集,对所述 8. The apparatus according to claim 6, wherein the information acquiring module comprises: a temporary information generating unit sets, when a number of semantic classes of the first set of information comprising less than the preset the number information is acquired at each semantic class information included in a message, to obtain a first set of temporary information; number number difference value calculation unit, difference in amount of information for the semantic category number of the preset computing value of the number; preset information acquisition unit, the first information set for the rest of their information to the keyword matching degree sorted in descending order, obtaining the information sorting position number less than or equal to the difference information on the number of values ​​to obtain the second set of temporary information and the first set and the second temporary information collection and temporary information, obtain information about the number of the preset information; first information acquiring means for, when the first set of information comprising a number of semantic classes is greater than the preset number information, a message is acquired in the information contained in each semantic class, to obtain a fourth set of temporary information, the 四临时信息集中的信息按其与所述关键字的匹配度由高到低进行排序,获取排序后第四临时信息集中信息位置序号小于或等于所述预设信息数量的信息,得到所述预设信息数量的信息。 Information set its four temporary information matching the keyword is sorted in descending order, the fourth temporary information set sorting information after obtaining the position number is less than or equal to the preset number information is information to obtain the pre- information set amount of information.
  9. 9.根据权利要求6所述的装置,其特征在于,所述信息获取模块具体包括: 第二信息获取单元,用于对所述第一信息集中的信息按其与所述关键字的匹配度由低到高进行排序,当所述第一信息集为SQ = {sq0, sq1; sq2, • , sqj , m为所述第一信息集中的信息数时,则根据rqx = sqy获取至少两个语义类的预设信息数量的信息;其中,= ,a=logNm, N为预设信息数量,rqx为按= 在SQ中获取后的信息。 9. The apparatus according to claim 6, wherein the information acquiring module comprises: a second information acquiring means for matching the first set of information with the information of its keyword it is sorted from low to high, when the first set of information is SQ = {sq0, sq1; when sq2, •, sqj, m is a number from information of the first information set, according to the acquired at least two rqx = sqy preset number information semantic class information; wherein, =, a = logNm, N is a preset number information, rqx the information acquired in the SQ press =.
  10. 10.根据权利要求6所述的装置,其特征在于,所述信息发送模块,具体包括: 关键字排序单元,用于对所述信息按其与所述关键字的匹配度由高到低进行排序; 信息发送单元,用于将排序后的信息,按顺序依次发送给用户。 10. The apparatus according to claim 6, wherein the information sending module comprises: a keyword sorting unit, for the information from high to low according to their degree of matching with the keyword ordering; information transmitting means for sorting the information sequentially in the order sent to the user.
CN 201110096463 2011-04-18 Information acquisition method and apparatus CN102750277B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201110096463 CN102750277B (en) 2011-04-18 Information acquisition method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201110096463 CN102750277B (en) 2011-04-18 Information acquisition method and apparatus

Publications (2)

Publication Number Publication Date
CN102750277A true true CN102750277A (en) 2012-10-24
CN102750277B CN102750277B (en) 2016-12-14

Family

ID=

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104765726A (en) * 2015-04-27 2015-07-08 湘潭大学 Data classification method based on information density

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1839386A (en) * 2003-08-21 2006-09-27 伊迪利亚公司 Internet searching using semantic disambiguation and expansion
CN101025753A (en) * 2007-03-28 2007-08-29 上海汉光知识产权数据科技有限公司 Patent search method
US20070294200A1 (en) * 1998-05-28 2007-12-20 Q-Phrase Llc Automatic data categorization with optimally spaced semantic seed terms
CN101169780A (en) * 2006-10-25 2008-04-30 华为技术有限公司 Semantic ontology retrieval system and method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070294200A1 (en) * 1998-05-28 2007-12-20 Q-Phrase Llc Automatic data categorization with optimally spaced semantic seed terms
CN1839386A (en) * 2003-08-21 2006-09-27 伊迪利亚公司 Internet searching using semantic disambiguation and expansion
CN101169780A (en) * 2006-10-25 2008-04-30 华为技术有限公司 Semantic ontology retrieval system and method
CN101025753A (en) * 2007-03-28 2007-08-29 上海汉光知识产权数据科技有限公司 Patent search method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104765726A (en) * 2015-04-27 2015-07-08 湘潭大学 Data classification method based on information density

Similar Documents

Publication Publication Date Title
US20060122994A1 (en) Automatic generation of taxonomies for categorizing queries and search query processing using taxonomies
US20090006382A1 (en) System and method for measuring the quality of document sets
US20120136649A1 (en) Natural Language Interface
US20080077570A1 (en) Full Text Query and Search Systems and Method of Use
US20030212663A1 (en) Neural network feedback for enhancing text search
Nie et al. Harvesting visual concepts for image search with complex queries
US20100241647A1 (en) Context-Aware Query Recommendations
Batsakis et al. Improving the performance of focused web crawlers
US20060248076A1 (en) Automatic expert identification, ranking and literature search based on authorship in large document collections
CN102831234A (en) Personalized news recommendation device and method based on news content and theme feature
US8620842B1 (en) Systems and methods for classifying electronic information using advanced active learning techniques
CN101373532A (en) FAQ Chinese request-answering system implementing method in tourism field
Lin et al. Generating event storylines from microblogs
CN102184262A (en) Web-based text classification mining system and web-based text classification mining method
CN102033955A (en) Method for expanding user search results and server
CN101334784A (en) Computer auxiliary report and knowledge base generation method
CN101901249A (en) Text-based query expansion and sort method in image retrieval
US20100299324A1 (en) Information service for facts extracted from differing sources on a wide area network
CN101944099A (en) Method for automatically classifying text documents by utilizing body
CN102663010A (en) Personalized image browsing and recommending method based on labelling semantics and system thereof
CN102609433A (en) Method and system for recommending query based on user log
Dou et al. Semantic data mining: A survey of ontology-based approaches
CN101030217A (en) Method for indexing and acquiring semantic net information
CN1158460A (en) Multiple languages automatic classifying and searching method
Wang et al. Adana: Active name disambiguation

Legal Events

Date Code Title Description
C06 Publication
C10 Entry into substantive examination
C41 Transfer of patent application or patent right or utility model
COR Change of bibliographic data

Free format text: CORRECT: ADDRESS; FROM: 518000 SHENZHEN, GUANGDONG PROVINCE TO: 518057 SHENZHEN, GUANGDONG PROVINCE

ASS Succession or assignment of patent right

Owner name: SHENZHEN SHIJI LIGHT SPEED INFORMATION TECHNOLOGY

Free format text: FORMER OWNER: TENGXUN SCI-TECH (SHENZHEN) CO., LTD.

Effective date: 20131121

C14 Grant of patent or utility model