CN101145153B - Method and system for searching information - Google Patents

Method and system for searching information Download PDF

Info

Publication number
CN101145153B
CN101145153B CN 200610154148 CN200610154148A CN101145153B CN 101145153 B CN101145153 B CN 101145153B CN 200610154148 CN200610154148 CN 200610154148 CN 200610154148 A CN200610154148 A CN 200610154148A CN 101145153 B CN101145153 B CN 101145153B
Authority
CN
China
Prior art keywords
information
keyword
words
search
word
Prior art date
Application number
CN 200610154148
Other languages
Chinese (zh)
Other versions
CN101145153A (en
Inventor
余斯恒
孔维青
张立中
王磊
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Family has litigation
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Priority to CN 200610154148 priority Critical patent/CN101145153B/en
Publication of CN101145153A publication Critical patent/CN101145153A/en
Application granted granted Critical
Publication of CN101145153B publication Critical patent/CN101145153B/en
First worldwide family litigation filed litigation Critical https://patents.darts-ip.com/?family=39207681&utm_source=google_patent&utm_medium=platform_link&utm_campaign=public_patent_search&patent=CN101145153(B) "Global patent litigation dataset” by Darts-ip is licensed under a Creative Commons Attribution 4.0 International License.

Links

Abstract

The present invention discloses a method and a system of information searching in the field of communication and solves the problems existing in the prior art that the searched information is far deviated from the topic and that the searching engine is overloaded and slowish. The present invention matches the key words inputted by users with the words in a topic keyword bank to determine the topic keywords, searches out the interrelated information closer to the intentions of the users and sorts the information to facilitate the users to see the information closest to the intentions, and simultaneously updates the topic keyword bank regularly or irregularly. The searching system consists of a user browser, a searching device, a topic keyword bank, an information index bank and an information-placing device, wherein, the searching device consists of a communication interface, a word-segmenting module, a filtering module and a searching engine.

Description

一种搜索信息的方法及系统 A method and system for searching for information

技术领域 FIELD

[0001] 本发明涉及计算机及通信领域,特别是一种搜索信息的方法及系统。 [0001] The present invention relates to a method and system of computers and communications, in particular, it is a search of information. 背景技术 Background technique

[0002] 随着互联网的发展,网络已成为人们获取信息的主要来源之一。 [0002] With the development of the Internet, the network has become a major source for people to obtain information. 用户大都使用搜索引擎来获得想要查找的相关信息。 Most users use search engines to get information you want to find.

[0003] 目前现有技术是,请参阅图1,用户在浏览器上输入欲获取信息的关键词,例如输入“我送什么生日礼物给好朋友”,发送到搜索引擎; [0003] technology is currently available, please refer to Figure 1, the user enters a keyword To obtain information on the browser, for example, enter "I sent what birthday gift to a good friend," sent to the search engine;

[0004] 搜索引擎对用户输入的关键词进行分词,将上例分成:我/送/什么/生日/礼物/给/好/朋友;然后剔除少量的常见过滤词,如“我”、“好”等,将剩下的词进行搜索,上例中剩下的词有“生日”、“礼物”和“朋友”; [0004] search engine keywords entered by the user segmentation, the above example is divided into: I / send / what / birthday / gift / to / good / friends; and then removing a small amount of common filtering word, such as "I", "good "and so on, the rest of the word search, word on the remaining cases have" birthday "," gift "and" friends ";

[0005] 将剩下的词进行“或”运算关系处理,可能出现的结果包括“生日/礼物”、“生日/ 朋友”或“生日/礼物/朋友”,根据运算结果到信息索引库中进行搜索,显然根据“生日/ 礼物”搜索到的结果较贴近主题,根据“生日/朋友”得到的结果距离主题较远。 [0005] The rest of the word "OR" operator deal with the relationship, possible outcomes include "birthday / gift", "birthday / friend" or "birthday / gift / friends", the index database information according to the operation result Search, apparently based on "birthday / gift" to the search results than close to the theme, based on the results far from the theme of "birthday / friend" get.

[0006] 运营商通过信息投放装置来进行信息的投放,并且为了可以有较多的用户搜索到此信息,需要列举大量的可能性关键词发送到信息索引库,其中大量关键词与此信息主题无关。 [0006] operators through information delivery means to deliver information, and may have more users in order to search for this information, you need a lot of possibilities keyword lists to send information index database, which a lot of information on this topic and keyword nothing to do.

[0007] 可见,采用上述方案会搜索到大量偏离主题的信息,并且这些信息可能被排在信息序列的前面,而用户最关心的信息被排在后面,给用户带来很大不便;同时搜索这些信息给搜索引擎带来较大的负担,并且影响搜索速度,占用大量网络资源。 [0007] found, the program will search for the above information to a large number of off-topic, and the information may be at the front of the sequence information, and the information the user is most concerned at the back, giving the user great inconvenience; while searching this information to the search engines bring greater burden and affect the search speed, take up a lot of network resources. 由于目前技术会将搜索到的结果直接以网页的形式展现给用户,如果是偏离主题的信息会给用户带来很大困扰。 As the current technology to the search results will be directly presented to the user in the form of a web page, if it is off-topic information will give users a great distress. 运营商需要列举大量用户可能输入的与信息主题无关的词语,每个词都需要缴纳一笔费用,增加了运营商的运营成本。 Operators need to enumerate a large number of words unrelated to the subject of the information the user may enter, every word needs to pay a fee, increasing the carrier's operating costs.

发明内容 SUMMARY

[0008] 本发明提供一种搜索方法及系统,用以解决现有技术中存在搜索的大量信息偏离主题,以及搜索引擎负担过重,速度较慢的问题。 [0008] The present invention provides a method and system for searching, the presence of large amounts of information to solve the problem of the prior art search off topic, and a search engine overburdened slower.

[0009] 本发明提供以下技术方案: [0009] The present invention provides the following technical solutions:

[0010] 一种搜索信息的方法,包括步骤: [0010] A method of search information, comprising the steps of:

[0011] 根据词性对用户输入的信息进行分词; [0011] The word speech information input by a user according to;

[0012] 将分词后得到的各词语与主题关键词库中预定义的主题关键词进行匹配,并将所述分词后得到的词语中匹配成功的词语确定为主题关键词,并删除未匹配成功的词语;其中,预定义的主题关键词为具有主题性的词; [0012] Each of the words and themes keyword database after word obtained predefined themes keyword matching, and words after the word obtained successfully matched words as the theme for keywords, and delete the unmatched success words; which predefined themes keyword is a word with thematic;

[0013] 在匹配成功后,进一步确定主题关键词的同义词,并将该同义词加入到主题关键词中; [0013] After successful match, a further determination topic keyword synonyms, the synonyms and added to the topic keyword;

[0014] 根据所述主题关键词搜索信息,并输出搜索结果。 [0014] The search keyword information of the subject, and outputs the search result. [0015] 在进行匹配前,根据词语的词性从分词后得到的词语中过滤掉与主题无关的词语,再将保留的各词语与所述主题关键词库中的词语进行匹配。 [0015] The matching is performed before the filtered speech words according to the word obtained from the word out in independent topic words, each word is then matched with reserved words in the topic keyword library.

[0016] 进一步将部分或全部未能与所述主题关键词库中预定义的主题关键词匹配成功的词语补充到该主题关键词库中。 [0016] Further part or all fail with the topic keyword library of predefined themes keyword matching words to add to the success of the topic keyword library.

[0017] 在搜索信息时,对各主题关键词按“或”运算关系进行处理。 [0017] When searching for information on the topics treated by keyword "or" operational relationship.

[0018] 搜索信息时,将主题关键词与信息库中的关键词匹配,获取所有匹配成功的关键词所对应的信息。 [0018] When searching for information, the topic keyword and keyword matching repository for information of all successfully matched keywords corresponding.

[0019] 在搜索到信息后,根据主题相关性对所述搜索到的各信息排序,将包含全部所述主题关键词的信息排在信息序列的前面。 [0019] After the search information, information relating to each of relevance to the search sorted according to contain all the information relating to the keyword information at the front of the sequence.

[0020] 一种用于搜索信息的装置,包括: [0020] An information searching apparatus, comprising:

[0021] 分词模块,用于根据词性对用户输入的信息进行分词; [0021] The segmentation module configured to perform word speech based on the information input by the user;

[0022] 过滤模块,用于将所述分词模块分词后得到的各词语与主题关键词库中预定义的主题关键词进行匹配,并将所述分词后得到的词语中匹配成功的词语确定为主题关键词, 并删除未匹配成功的词语;其中,预定义的主题关键词为具有主题性的词;以及进一步确定所述主题关键词的同义词,将同义词加入到主题关键词中; The term sub-library of predefined words and each word block obtained keyword relating to the topic keyword matching word, and the words in the word obtained after successfully matched [0022] The filtering module is configured to determine the TAGs, and delete words not successfully matched; wherein the predefined keywords relating to words having theme; and further determines the topic keyword synonyms, the synonym is added to the topic keyword;

[0023] 搜索引擎,用于根据所述过滤模块确定的主题关键词搜索信息,并输出搜索结果。 [0023] The search engine, according to information relating to the keyword search module determines the filter, and outputs the search result.

[0024] 所述过滤模块根据词语的词性从分词后得到的词语中过滤掉与主题无关的词语, 再将保留的各词语与所述主题关键词库中预定义的主题关键词进行匹配。 [0024] The filter module according to filter speech from the words in the words in the word obtained regardless of the topic words out, and then each of the reserved words of the topic keyword database relating to predefined keyword matching.

[0025] 所述搜索引擎在搜索信息时,对各主题关键词按“或”运算关系进行处理。 The [0025] search engine when searching for information relating to each processed by keywords "or" calculating relationships.

[0026] 一种用于搜索信息的系统,其特征在于,包括: [0026] A system for searching for information, characterized by comprising:

[0027] 主题关键词库,用于存储主题关键词; [0027] topic keyword database for storing theme keywords;

[0028] 浏览器,用于为用户提供搜索界面和信息展示,将用户输入的信息发送到搜索装置和从搜索装置获取搜索结果; [0028] browser, for providing a user interface and information presentation search, transmits a search information input by the user device search results from the search and retrieval apparatus;

[0029] 搜索装置,用于对接收到的信息分词,将分词后的各词语与所述主题关键词库中预定义的主题关键词进行匹配,并将所述分词后得到的词语中匹配成功的词语确定为主题关键词,并删除未匹配成功的词语;其中,预定义的主题关键词为具有主题性的词;以及进一步确定所述主题关键词的同义词,将同义词加入到主题关键词中;根据所述主题关键词搜索信息。 [0029] The search device for received information word, each word after word to the topic keyword database relating to predefined matching keywords and the words in the word obtained after successfully matched determining the theme words keywords, and delete words not successful matching; wherein the predefined keywords relating to words having theme; and further determines the topic keyword synonyms, the synonym is added to the topic keyword ; to search for information based on the keyword theme.

[0030] 该系统还包括: [0030] The system further comprises:

[0031] 信息投放装置,用于投放信息内容和对应的关键词; [0031] The information serving means for serving content and keywords corresponding to;

[0032] 信息库,用于存储所述信息内容和对应的关键词,并将关键词传送给所述主题关键词库,以及为所述搜索装置提供信息资源和搜索接口。 [0032] repository for content and stores the corresponding keyword and transmit the keyword to the keyword database relating to, and the search information to search resources and interface means.

[0033] 所述搜索装置包括: [0033] The search apparatus comprising:

[0034] 分词模块,用于根据词性对用户输入的信息进行分词; [0034] The segmentation module configured to perform word speech based on the information input by the user;

[0035] 过滤模块,用于将分词后得到的各词语与主题关键词库中预定义的主题关键词进行匹配,并将所述分词后得到的词语中匹配成功的词语确定为主题关键词; [0035] The filtering module, for the words of each word obtained keyword database with topic relating to predefined matching keywords and the words in the word obtained after successfully matched words determined TAGs;

[0036] 搜索引擎,用于根据所述过滤模块确定的主题关键词搜索信息,并输出搜索结果。 [0036] The search engine, according to information relating to the keyword search module determines the filter, and outputs the search result.

[0037] 本发明有益效果如下: [0037] Advantageous effects of the present invention are as follows:

[0038] 本发明利用主题关键词库对用户输入的关键词进行预处理,筛选出主题关键词,搜索出与用户期望主题接近的相关信息,避免搜索出较多偏离主题的信息,减少了对用户的干扰,同时也减少了搜索引擎的负担,进而提高了搜索速度。 [0038] The present invention makes use of the keyword relating to the user input keyword database is pre-processed, selected topic keywords, search out a desired user related information relating to proximity, an information search to avoid more off-topic, reduces the user interference, while also reducing the burden of search engines, thereby improving the search speed.

[0039] 本发明进一步将搜索出的信息根据主题相关性进行排序,使用户清楚看到最贴近主题的信息,用户体验较好。 [0039] The present invention further information will be searched by topic relevance sorted, allowing users to clearly see the information most relevant topics, the user experience better. 相应的,运营商不再需要列举大量主题无关词来吸引用户搜索,从而降低了其运营成本,同时也为信息索引库节省大量空间。 Accordingly, the operator no longer needs to enumerate a large number of topics unrelated words to attract users to search, thereby reducing their operating costs, but also save a lot of space for the index database information. 本发明定期更新主题关键词库,使用户能够更方便的搜索到其关注的相关信息。 The present invention is regularly updated theme keyword database, allowing users to more easily search for information related to their concerns.

附图说明 BRIEF DESCRIPTION

[0040] 图1为现有技术中搜索系统结构图; [0040] FIG. 1 is a configuration diagram of a search system of the prior art;

[0041] 图2A为本发明实施例中搜索系统结构图; [0041] FIG 2A search system configuration diagram in the embodiment of the present invention;

[0042] 图2B为本发明实施例中搜索方法的基本流程图; [0042] Figure 2B flowchart of the basic search method embodiment of the invention;

[0043] 图3为本发明实施例中搜索装置的结构示意图; [0043] Fig 3 a schematic structural diagram of the embodiment of the present invention, the search apparatus;

[0044] 图4为本发明实施例中搜索方法的具体流程图; [0044] Figure 4 a flowchart of searching a specific embodiment of the method of the present invention;

[0045] 图5为本发明实施例中更新主题关键词库的方法流程图。 [0045] FIG. 5 flowchart of a method of updating topic keyword database in the embodiment of the present invention.

具体实施方式 Detailed ways

[0046] 本发明通过对用户输入的关键词进行分词,过滤掉与主题无关的词,再将剩下的关键词与主题关键词库中的词语匹配,根据筛选出主题关键词搜索信息,使搜索时的关键词更加贴近主题,减少无关信息。 [0046] The present invention is by keywords input by the user of the word, to filter out irrelevant to the subject word, and then the remaining keywords and match words in the topic keyword database according to the search keyword selected information relating to the Key words closer to when the search topic, reduce irrelevant information.

[0047] 参见图2A,本实施例中用于搜索信息的系统结构包括用户浏览器21、搜索装置22、主题关键词库23、信息索引库24和信息投放装置25。 [0047] Referring to Figure 2A, the system for searching the information structure of the present embodiment includes a user browser 21, search means 22, relating to keyword database 23, the index information database 24 and information delivery means 25.

[0048] 主题关键词库23存储主题关键词,并保持定期更新。 [0048] topic keyword database 23 stores keyword theme, and keep regularly updated. 用户浏览器21为用户提供搜索界面和信息展示,将用户输入的关键词发送到搜索装置22。 21 to the user's browser to provide a user interface and information presentation search, transmits a search keyword input by the user to the apparatus 22. 搜索装置22对接收到的关键词分词,将分解后的关键词与主题关键词库23中的主题关键词进行匹配,并将匹配后的主题关键词进行“或”运算关系处理,根据处理结果到信息索引库24中搜索信息,以及定期或不定期将匹配失败的词补充到主题关键词库23中作为主题关键词。 Docking means 22 search keywords received word, the decomposition of the keyword and the keyword database 23 relating to topics match keywords, and the keywords matched theme "or" calculating processing relationships, according to the processing result to index information database 24 to search for information, as well as regular or occasional words will match failed to add the topic to topic as a keyword library 23 keywords. 信息索引库24 为搜索装置22提供资源和搜索接口,以及接收信息投放装置25发送的信息内容和对应的信息关键词;信息索引库24将所述信息关键词与主题关键词库23中的主题关键词进行匹配,保留匹配成功的信息关键词,每个匹配后的信息关键词与信息建立链接;同时信息索引库24还会对信息投放装置25发送的信息关键词进行分词提取,确定新的主题关键词并将其定期和不定期的补充到主题关键词库23。 Index information repository 24 searches resources and means for providing a search interface, and receiving content information and transmitting information corresponding to the keyword of the apparatus 25 serving as 22; 24 index database information relating to the information relating to the keyword in the keyword database 23 keywords match to retain successful match keyword, keyword and information after each match to establish a link; while 24 index database information will be put in the device 25 with keyword sent word to extract the information, identify new theme keywords and regular and irregular supplement to the topic keyword database 23. 信息投放装置25为运营商提供信息投放的平台,并向信息索引库24发送运营商投放的信息内容,以及为信息内容设定的信息关键词。 Information delivery device 25 operators to provide information delivery platform, and sends the information content operators served 24 index database information, and setting information for the content information for the keyword.

[0049] 参见图2B,本实施例中搜索信息的基本流程如下: [0049] 2B, the basic process of searching for information in the embodiment of the present embodiment is as follows:

[0050] 步骤210 :根据词性,将用户输入的关键词进行分词。 [0050] Step 210: The part of speech, the keyword is a user input word.

[0051] 步骤220 :从分得的词中过滤掉明显与搜索信息主题无关的词。 [0051] Step 220: the share of filtering out the word obviously has nothing to do with the theme of the word to search for information.

[0052] 步骤230 :将剩余的保留词与主题关键词库23中的词进行匹配,确定匹配成功的词为主题关键词。 [0052] Step 230: The remaining reserved words relating to the keyword database 23 matches a word, the word is determined successfully matched keyword theme.

[0053] 步骤240 :在主题关键词库23中查询主题关键词的同义词,并将同义词加入到主题关键词中。 [0053] Step 240: query subject keyword synonyms in the topic keyword database 23, and added to the subject of keyword synonyms. [0054] 步骤250 :根据确定的所有主题关键词在信息索引库24中搜索信息,并将搜索结果输出到用户浏览器21。 [0054] Step 250: searching for information in the information index database 24 all theme of the keywords and outputs the search result to the user's browser 21.

[0055] 参见图3,本实施例中搜索装置22包括通信接口301、分词模块302、过滤模块303 和搜索引擎304。 [0055] Referring to Figure 3, the search apparatus 22 embodiment includes a communication interface 301, a segmentation module 302, filtering module 303 and the search engine 304 in this embodiment.

[0056] 通信接口301接收用户通过用户浏览器21发送的关键词,将其转发给分词模块302,并向用户发送信息搜索结果;分词模块302将用户输入的关键词根据词性进行分词(如使用现有的分词工具YWS (Yahoo Word SegmentationYahoo,分词系统)进行分词);过滤模块303根据分词模块302的分词结果,对分解后的词逐个进行分析。 [0056] The communication interface 301 receives a user through the user's browser keyword transmission 21, forwarded to the segmentation module 302, and transmits the information search result of the user; segmentation module 302 is the keyword input by a user word (e.g., using the speech conventional segmentation tool YWS (Yahoo Word SegmentationYahoo, segmentation system) word); a filter module 303 according to the result of word segmentation module 302, for word-by-analysis after decomposition. 首先过滤掉主题无关词,然后将剩下的保留词与主题关键词库23中的主题关键词进行匹配,同时查找匹配成功的主题关键词的同义词并将其作为主题关键词,然后将所有主题关键词按“或”运算关系发送到搜索引擎304。 First word filter out irrelevant topics, and then the rest of the reserved word keyword database 23 with the theme topic keyword matches, while looking for a match keyword synonyms successful theme as its theme and keywords, and then all topics keywords sent to the search engine 304 by "or" operator relationships. 搜索引擎304根据过滤模块303过滤后的结果到信息索引库24中搜索与此结果匹配的信息关键词,进一步搜索到链接的信息,并将所述信息进行“与”运算关系处理,然后对搜索结果进行主题相关性排序,即包括最多主题关键词的信息排在前面。 The results of the search engine 304 to the filter module 303 filters the information index database 24 searches information matching results with this keyword, the search information further links, and the information "and" operation relationship processing, then the search the results of thematic relevance ranking, which includes the largest theme keyword information standing in the front. 最后用户浏览器21通过通信接口301获取排列好的信息。 Finally, the user browser 21 301 acquires information through the communication interface aligned.

[0057] 在本实施例中,主题无关词包括:动词、形容词和副词等,即去掉这些词后不会影响信息的搜索范围,也不会偏离用户的搜索主题。 [0057] In the present embodiment, regardless of topic words comprising: a search range information does not affect the verbs, adjectives and adverbs, i.e. remove these words, the user does not shift search topics.

[0058] 参阅图4所示,以用户输入的关键词“我送什么生日礼物给好朋友”为例搜索相关信息的具体流程如下: [0058] Referring to FIG. 4, the user enters the keyword "My birthday present to send to friends" for example search for information related to the specific process is as follows:

[0059] 步骤401 :分词模块302采用分词工具将通信接口301接收的用户输入的关键词“我送什么生日礼物给好朋友”根据词性进行分词,分词结果为:我/送/什么/生日/礼物 [0059] Step 401: segmentation module 302 uses segmentation tool communication interface 301 receives a user input a keyword "I send a birthday present to friends" in word according to the part of speech, word results: I / send / what / birthday / gift

/给/好/朋友。 / To / good / friends.

[0060] 步骤402 :过滤模块303过滤掉主题无关词,例1中过滤掉的有“我”、“送”、“什么”、 “给”和“好”。 [0060] Step 402: The filter module 303 to filter out topics unrelated words, in Example 1, filter out the "I", "send", "what", "to" and "good."

[0061] 步骤403 :从剩余的未处理的词中取一个词与主题关键词库23中的主题关键词进行匹配,例如剩余词有“生日”、“礼物”和“朋友”,从中取词“朋友”进行匹配。 [0061] Step 403: Take from the remaining unprocessed word in a word with the theme topic keyword keyword database 23 matches, such as the remaining term of "birthday", "gift" and "friends", from which to take the word "friends" to match.

[0062] 步骤404 :判断该词是否属于主题关键词,若是,则执行步骤405,否则执行步骤406。 [0062] Step 404: determine whether the term is the subject of keywords, and if so, step 405 is performed, otherwise step 406.

[0063] 步骤405 :将匹配成功的词确定为主题关键词,继续步骤407。 [0063] Step 405: The word success is determined to match the theme keyword, proceed to step 407.

[0064] 步骤406 :删除不属于主题关键词的词(例如删除的词是“朋友”),进一步执行步骤407。 [0064] Step 406: Delete the topic keyword word does not belong (eg deletion of the word is "friends"), further to Step 407.

[0065] 步骤407 :判断是否有剩余的词没有与主题关键词库23中的主题关键词进行匹配,若是,则执行步骤404,否则执行408。 [0065] Step 407: determine whether the remaining words did not match with the theme keyword database 23 theme keywords, and if so, step 404 is executed, otherwise 408.

[0066] 例如还有词语“生日”和“礼物”没有匹配,那么执行步骤404,再一次取词“生日”。 [0066] There are words such as "birthday" and "gift" does not match, then step 404, once again take the word "birthday."

[0067] 步骤408 :在主题关键词库23中查找确定的主题关键词的同义词,并将其加入到主题关键词中。 [0067] Step 408: Find keyword themes identified in the topic keyword database 23 synonyms, and add it to the theme keyword. 如“礼物”的同义词“礼品”,将“礼品”也作为主题关键词。 Such as "gift" synonymous with "gifts", the "gift" is also the theme of keywords.

[0068] 步骤409 :将筛选出的主题关键词进行“或”运算关系处理,得到处理结果。 [0068] Step 409: The selected keywords relating to "or" calculating relationship to give the results of the processing. 例1中主题关键词为“生日”和“礼物”,以及同义词“礼品”,处理结果为“生日/礼物”和“生日/ 礼品”。 Example 1 theme for the keyword "birthday" and "gift", and synonyms "gift", and the results as a "birthday / gift" and "birthday / gift." 搜索引擎304到信息索引库24中搜索与运算结果相匹配的信息关键词。 Search engine 304 indexes through 24 library search information and the calculation result matches the keyword. 如“生日礼物”、“生日礼品”、“生日”、“礼物”和“礼品”。 Such as "birthday gift", "birthday gift", "birthday", "gift" and "gifts." 根据匹配到的信息关键词与信息的链接,搜索出所有包含信息关键词的信息,将信息进行“与”运算关系处理,即,使所有搜索出的信息被放在同一页面下。 According to the information of the link keyword matching to search out all the information contains information keyword, the information "and" operation deal with the relationship, that is, the search out all the information is on the same page.

[0069] 步骤410 :根据信息关键词与主题的相关性对搜索出的信息进行排序。 [0069] Step 410: sorting the information according to the searched relevant information to the topic keyword. 例如,“生日礼物”与用户的意图最接近,故将根据“生日礼物”搜索出的信息排在最前面,以此类推。 For example, a "birthday gift" is closest to the user's intention, it will be based on the information at the top "birthday gift" search out, and so on.

[0070] 步骤411 :用户浏览器21从搜索装置22处获取排列好顺序的信息展现给用户,包括用户信息的标题,简介和用户网站页面的链接。 [0070] Step 411: the user's browser from the search means 21 are arranged at 22 to obtain information in good order presented to the user, including the title link user information, user profiles and site pages.

[0071] 产生新的主题词条来扩充主题关键词库23的一种方式是搜索装置22每次对用户输入的关键词进行筛选时,将具有主题性但不属于主题关键词的词保留(即保留前述步骤406中欲删除的词);然后,定期对保留的词进行审核后生成主题词条,并将主题词条加入到主题关键词库23中。 [0071] bar to a new keyword relating to the expansion keyword database 23 is a one way device 22 each time the search keyword input by a user to filter, having a word that is the subject theme but not reserved keywords ( That is to retain the previous step to be deleted 406 words); then, the reserved word regularly review the article to generate keywords and keywords added to Article 23 theme keyword library. 通过这种方式,可以发现新生成的词汇,比如“超女”等。 In this way, you can discover a new generation of vocabulary, such as "Super Girl" and so on. 另外一种方式是信息索引库24分析来自信息投放装置25的信息关键词,提取出新的主题关键词,生成主题词条并补充到主题关键词库23,比如品牌名词“美的”等。 Another way is to analyze the information index database 24 serving information from the device information 25 of the keyword extracted new topic keywords, generating keywords strips and added to the topic keyword database 23, such as brand term "United States" and so on.

[0072] 参见图5,本实施例中根据关键词生成主题词条并补充到主题关键词库中的具体步骤如下: [0072] Referring to Figure 5, the present embodiment MeSH keyword generation article and added to the step relating to the specific keyword library according to the following:

[0073] 步骤501 :选择合适的分词工具,并将一定量的专用词汇补充到分词工具的基本词典中,使得分词工具遇到这些字的组合时可以将其分成一个词。 [0073] Step 501: Select the appropriate segmentation tool and a certain amount of words added to the basic dedicated word dictionary tool, which means that the word may be divided into a word is encountered in the combination of these words. 例如词语“生日”,不希望将其分成“生”和“日”。 For example the word "birthday" and does not want to divide it into "life" and "day."

[0074] 步骤502 :使用分词工具对需要补充到词库中的关键词进行分词,生成基本词条。 [0074] Step 502: Use the word tool in the need to add to the lexicon word keywords, generates a base entry.

[0075] 步骤503 :对基本词条进行筛选,筛选出没有实际意义的词,例如“我”。 [0075] Step 503: the basic entry screening, screening out the word has no real meaning, such as "I am."

[0076] 步骤504 :对筛选后的基本词条进行词性分析,对副词、形容词和动词等进行审核,滤除不具主题性的词,如“送”和“好”等。 [0076] Step 504: the basic entry after the screening part of speech analysis, adverbs, adjectives and verbs such as audit, filter out non-word theme, such as "send" and "good" and so on.

[0077] 步骤505 :将剩余具有主题性的词定为主题关键词并保留,生成主题词条,一般为各类名词,如地名。 [0077] Step 505: The remaining words having a given theme and theme reserved keywords, generating MeSH strip, generally for all types of nouns, such as place names.

[0078] 步骤506 :将与主题关键词同义的词存入该主题词条中。 [0078] Step 506: the theme will be deposited with the keyword synonymous word keywords bar.

[0079] 步骤507 :将主题词条保存到所述主题关键词库中。 [0079] Step 507: Keywords bar to save the topic keyword library. 主题关键词库可以应用类似于多级倒排表索引结构。 Theme keyword database can be used similar to a multi-stage inverted index table structure.

[0080] 本发明利用主题关键词库对用户输入的关键词进行预处理,筛选出主题关键词, 搜索出与用户期望主题接近的相关信息,从而避免了搜索出较多偏离主题的信息,减少了对用户的干扰,同时也减少搜索引擎的负担,进而提高了搜索速度。 [0080] The present invention makes use of the keyword relating to the user input keyword database is pre-processed, selected topic keywords, search out the user's desired information relating to proximity, thereby avoiding a more off-topic information search and reduce the disruption to users, but also to reduce the burden of search engines, thereby improving the search speed. 本发明进一步将搜索出的信息根据主题相关性进行排序,使用户能够清楚看到最贴近主题的信息,从而为用户带来更好的体验。 The present invention further information will be searched by topic relevance sorted, allowing users to clearly see the information most relevant topics, giving users a better experience. 相应的,运营商不再需要列举大量主题无关词来吸引用户搜索,减少了其运营成本,同时也为信息索引库节省大量空间。 Accordingly, the operator no longer needs to enumerate a large number of topics unrelated words to attract users to search, to reduce its operating costs, but also save a lot of space for the index database information. 本发明定期更新主题关键词库,使用户能够更方便的搜索到其关注的相关信息。 The present invention is regularly updated theme keyword database, allowing users to more easily search for information related to their concerns.

[0081] 显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。 [0081] Obviously, those skilled in the art can make various modifications and variations to the invention without departing from the spirit and scope of the invention. 这样,倘若对本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。 Thus, if part of the claimed invention for such modifications and variations within the scope of the present invention and equivalents thereof, the present invention intends to include these modifications and variations.

Claims (12)

1.一种搜索信息的方法,其特征在于,包括以下步骤:根据词性对用户输入的信息进行分词;将分词后得到的各词语与主题关键词库中预定义的主题关键词进行匹配,并将所述分词后得到的词语中匹配成功的词语确定为主题关键词,并删除未匹配成功的词语;其中,预定义的主题关键词为具有主题性的词;在匹配成功后,进一步确定主题关键词的同义词,并将该同义词加入到主题关键词中;根据所述主题关键词搜索信息,并输出搜索结果。 1. A method of searching for information, characterized by comprising the steps of: speech inputted user information word; and relating to each of the words in the keyword database after a predefined word match keywords relating obtained, and the words are words obtained after the segment is determined to be successfully matched keyword theme, and delete words not successfully matched; wherein the predefined keywords relating to words having thematic; after a successful match, a further determination topic synonymous keywords, synonyms added to the topic and the keyword; according to the information relating to the search keyword, and outputs the search result.
2.如权利要求1所述的搜索信息的方法,其特征在于,在进行匹配前,根据词语的词性从分词后得到的词语中过滤掉与主题无关的词语,再将保留的各词语与所述主题关键词库中的词语进行匹配。 2. The method of searching for information according to claim 1, characterized in that, before performing matched filtering off-topic words from the words in the word obtained according to part of speech of words, and each word is then retained said words topic keyword library to match.
3.如权利要求2所述的搜索信息的方法,其特征在于,进一步将部分或全部未能与所述主题关键词库中预定义的主题关键词匹配成功的词语补充到该主题关键词库中。 3. The method of searching for information according to claim 2, characterized in that the further part or all of the failure relating to a predefined keyword library successfully matched keyword relating to the topic words added keyword database in.
4.如权利要求1至3任一项所述的搜索信息的方法,其特征在于,在搜索信息时,对各主题关键词按“或”运算关系进行处理。 The method of searching for information according to any one of claims 1 to 3 as claimed in claim 4, wherein, when searching for information relating to each processed by keywords "or" calculating relationships.
5.如权利要求4所述的搜索信息的方法,其特征在于,搜索信息时,将主题关键词与信息库中的关键词匹配,获取所有匹配成功的关键词所对应的信息。 5. The method of searching for information according to claim 4, characterized in that, when searching for information relating to the keyword information matches the keyword library, all the information acquired successfully matched keyword corresponds.
6.如权利要求5所述的搜索信息的方法,其特征在于,在搜索到信息后,根据主题相关性对所述搜索到的各信息排序,将包含全部所述主题关键词的信息排在信息序列的前面。 6. The method of searching for information according to claim 5, wherein, after searching the information, each information topic relevance to the search sorted according to contain all the information relating to the keyword in row the foregoing information sequence.
7.一种用于搜索信息的装置,其特征在于,包括:分词模块,用于根据词性对用户输入的信息进行分词;过滤模块,用于将所述分词模块分词后得到的各词语与主题关键词库中预定义的主题关键词进行匹配,并将所述分词后得到的词语中匹配成功的词语确定为主题关键词,并删除未匹配成功的词语;其中,预定义的主题关键词为具有主题性的词;以及进一步确定所述主题关键词的同义词,将同义词加入到主题关键词中;搜索引擎,用于根据所述过滤模块确定的主题关键词搜索信息,并输出搜索结果。 An apparatus for searching for information, characterized by comprising: a segmentation module configured to perform word speech based on the information input by the user; filtering module for each of the words in the word with the subject obtained after segmentation module keywords library of predefined keywords match the theme and the words after the word obtained matching words to determine the success of the theme keywords, and delete unmatched success words; which predefined themes for the keyword words having theme; and further determines the topic keyword synonyms, the synonym is added to the topic keyword; a search engine, according to information relating to the keyword search module determines the filter, and outputs the search result.
8.如权利要求7所述的用于搜索信息的装置,其特征在于,所述过滤模块根据词语的词性从分词后得到的词语中过滤掉与主题无关的词语,再将保留的各词语与所述主题关键词库中预定义的主题关键词进行匹配。 8. The apparatus searches for information according to claim 7, characterized in that the filter module filtering based on the words from the words in the speech words in the word obtained regardless of the subject matter off, then the respective words and reserved the topic keyword library of predefined keywords match the theme.
9.如权利要求7或8所述的用于搜索信息的装置,其特征在于,所述搜索引擎在搜索信息时,对各主题关键词按“或”运算关系进行处理。 Information search apparatus according to 7 or 8 is used as claimed in claim, wherein said search engine when searching for information relating to each processed by keywords "or" calculating relationships.
10.一种用于搜索信息的系统,其特征在于,包括:主题关键词库,用于存储主题关键词;浏览器,用于为用户提供搜索界面和信息展示,将用户输入的信息发送到搜索装置和从搜索装置获取搜索结果;搜索装置,用于对接收到的信息分词,将分词后的各词语与所述主题关键词库中预定义的主题关键词进行匹配,并将所述分词后得到的词语中匹配成功的词语确定为主题关键词,并删除未匹配成功的词语;其中,预定义的主题关键词为具有主题性的词;以及进一步确定所述主题关键词的同义词,将同义词加入到主题关键词中;根据所述主题关键词搜索信息。 10. A system for searching for information, characterized by comprising: a topic keyword database for storing keywords relating; browser, search for providing a user interface and display information, sends the information to user input and search means acquires the search results from the search apparatus; search means for received information word, each word after word to the topic keyword database relating to predefined keyword matching, and the word are words obtained after successfully matched keyword determining the theme, and delete words not successfully matched; wherein the predefined keywords relating to words having theme; and further determining keyword synonyms of the subject, the was added to a synonym keyword relating; search keyword information of the subject matter.
11.如权利要求10所述的用于搜索信息的系统,其特征在于,还包括: 信息投放装置,用于投放信息内容和对应的关键词;信息库,用于存储所述信息内容和对应的关键词,并将关键词传送给所述主题关键词库,以及为所述搜索装置提供信息资源和搜索接口。 11. A system for searching for information as claimed in claim 10, characterized in that, further comprising: information serving means for serving content and the corresponding keyword; an information repository for storing the information corresponding to the content and keyword, keyword and keyword database relating to the transfer, and provide information to the search and the search interface device resources.
12.如权利要求10或11所述的用于搜索信息的系统,其特征在于,所述搜索装置包括:分词模块,用于根据词性对用户输入的信息进行分词;过滤模块,用于将分词后得到的各词语与主题关键词库中预定义的主题关键词进行匹配,并将所述分词后得到的词语中匹配成功的词语确定为主题关键词;搜索引擎,用于根据所述过滤模块确定的主题关键词搜索信息,并输出搜索结果。 A system for searching for information as claimed in claim 11 or claim 10, wherein said searching means comprising: a segmentation module configured to perform word speech based on the information input by the user; filtering module for word after obtaining each word and topic keyword database relating to predefined keyword matching words and word obtained after the successfully matched words determined TAGs; search engine, according to the filter module the themes identified by the keyword search information, and outputs the search results.
CN 200610154148 2006-09-13 2006-09-13 Method and system for searching information CN101145153B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200610154148 CN101145153B (en) 2006-09-13 2006-09-13 Method and system for searching information

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN 200610154148 CN101145153B (en) 2006-09-13 2006-09-13 Method and system for searching information
HK08107736A HK1113836A1 (en) 2006-09-13 2008-07-14 Method and system for searching information

Publications (2)

Publication Number Publication Date
CN101145153A CN101145153A (en) 2008-03-19
CN101145153B true CN101145153B (en) 2011-03-30

Family

ID=39207681

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200610154148 CN101145153B (en) 2006-09-13 2006-09-13 Method and system for searching information

Country Status (2)

Country Link
CN (1) CN101145153B (en)
HK (1) HK1113836A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106209508A (en) * 2016-07-05 2016-12-07 马岩 Local area network mail data-based capture method and system
CN106209507A (en) * 2016-07-04 2016-12-07 马岩 Network mail data-based capture method and system

Families Citing this family (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102339294B (en) * 2010-07-27 2013-09-11 卓望数码技术(深圳)有限公司 Searching method and system for preprocessing keywords
CN102479193B (en) * 2010-11-22 2015-04-01 百度在线网络技术(北京)有限公司 Method and equipment for match search popularization based on match bid coefficient
CN102043845B (en) * 2010-12-08 2013-08-21 百度在线网络技术(北京)有限公司 Method and equipment for extracting core keywords based on query sequence cluster
CN102591880B (en) * 2011-01-14 2015-02-18 阿里巴巴集团控股有限公司 Information providing method and device
CN102722498B (en) * 2011-03-31 2015-06-03 北京百度网讯科技有限公司 Search engine and implementation method thereof
CN102169503B (en) * 2011-04-29 2013-04-24 北京百度网讯科技有限公司 Method and device for obtaining searching result corresponding with user query sequence
CN102929873B (en) * 2011-08-08 2017-03-22 腾讯科技(深圳)有限公司 Method and apparatus for context-based search to extract the value of a search word
CN103064838B (en) * 2011-10-19 2016-03-30 阿里巴巴集团控股有限公司 Data relevant to a method and apparatus
CN103106220B (en) * 2011-11-15 2016-08-03 阿里巴巴集团控股有限公司 Method of searching, and searching means for searching engine system
CN103310343A (en) * 2012-03-15 2013-09-18 阿里巴巴集团控股有限公司 Commodity information issuing method and device
CN102880633A (en) * 2012-07-27 2013-01-16 四川长虹电器股份有限公司 Content pushing method based on characteristic word
CN103678292B (en) * 2012-08-29 2018-06-05 百度在线网络技术(北京)有限公司 Method and apparatus for sorting types of information based on location
CN102902722B (en) * 2012-09-04 2015-09-02 北京奇虎科技有限公司 An information processing method and a security system
CN102929925A (en) * 2012-09-20 2013-02-13 百度在线网络技术(北京)有限公司 Search method and device based on browsing content
CN103077218B (en) * 2012-12-28 2016-08-24 百度在线网络技术(北京)有限公司 A method for determining the device query request to query requirement information sequence
CN103914492B (en) * 2013-01-09 2018-02-27 阿里巴巴集团控股有限公司 Query word fusion, merchandise information distribution methods and search methods and systems
CN103488763A (en) * 2013-09-26 2014-01-01 乐视致新电子科技(天津)有限公司 Search method and search device
CN103488762A (en) * 2013-09-26 2014-01-01 乐视致新电子科技(天津)有限公司 Search method and search device
CN104715066B (en) * 2015-03-31 2017-04-12 北京奇付通科技有限公司 Optimization for searching method, apparatus and system for
CN104899308A (en) * 2015-06-12 2015-09-09 北京奇虎科技有限公司 Method and device for information recommendation under fault in browser page
CN104881503A (en) * 2015-06-24 2015-09-02 郑州悉知信息技术有限公司 Data processing method and device
CN106488310A (en) * 2015-08-31 2017-03-08 晨星半导体股份有限公司 Television program smart playing method and control device thereof
CN106547762A (en) * 2015-09-17 2017-03-29 深圳市世强先进科技有限公司 Keyword defining method and system
CN106815262A (en) * 2015-12-01 2017-06-09 北京国双科技有限公司 Searching method and device of judgment document
CN106815263B (en) * 2015-12-01 2019-04-12 北京国双科技有限公司 The searching method and device of legal provision
CN106815265A (en) * 2015-12-01 2017-06-09 北京国双科技有限公司 Searching method and device of judgment document
WO2018027342A1 (en) * 2016-08-06 2018-02-15 马岩 Application method and system for synonym in big data search
CN106250531A (en) * 2016-08-06 2016-12-21 马岩 Method and system of applying synonym to big data search
CN106503251A (en) * 2016-11-11 2017-03-15 广州市万表科技股份有限公司 Searching method and searching device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1335574A (en) 2001-09-05 2002-02-13 罗笑南 Intelligent semantic searching method
US20040199498A1 (en) 2003-04-04 2004-10-07 Yahoo! Inc. Systems and methods for generating concept units from search queries

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1335574A (en) 2001-09-05 2002-02-13 罗笑南 Intelligent semantic searching method
US20040199498A1 (en) 2003-04-04 2004-10-07 Yahoo! Inc. Systems and methods for generating concept units from search queries

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106209507A (en) * 2016-07-04 2016-12-07 马岩 Network mail data-based capture method and system
CN106209508A (en) * 2016-07-05 2016-12-07 马岩 Local area network mail data-based capture method and system

Also Published As

Publication number Publication date
HK1113836A1 (en) 2011-08-05
CN101145153A (en) 2008-03-19

Similar Documents

Publication Publication Date Title
Callan et al. TREC and TIPSTER experiments with INQUERY
US7526425B2 (en) Method and system for extending keyword searching to syntactically and semantically annotated data
US7444348B2 (en) System for enhancing a query interface
US6088692A (en) Natural language method and system for searching for and ranking relevant documents from a computer database
EP1678639B1 (en) Systems and methods for search processing using superunits
CN102289462B (en) Phrase-based searching in an information retrieval system
US8290956B2 (en) Methods and systems for searching and associating information resources such as web pages
JP4574356B2 (en) Electronic document repository management and access system
CN101116072B (en) Method and system for categorized presentation of search results
CN103500198B (en) The method of inputting information by combining the user to search and system
US7499940B1 (en) Method and system for URL autocompletion using ranked results
US6321228B1 (en) Internet search system for retrieving selected results from a previous search
US7027975B1 (en) Guided natural language interface system and method
JP4274689B2 (en) Method and system for selecting a data set
JP4805929B2 (en) Search systems and methods using inline context query
US7680778B2 (en) Support for reverse and stemmed hit-highlighting
US7617193B2 (en) Interactive user-controlled relevance ranking retrieved information in an information search system
US7487145B1 (en) Method and system for autocompletion using ranked results
US6236991B1 (en) Method and system for providing access for categorized information from online internet and intranet sources
CN1310175C (en) Search engine management system and method
US7111237B2 (en) Blinking annotation callouts highlighting cross language search results
JP4838529B2 (en) Reinforced clustering of multi-type data objects for search term suggestion
US7562069B1 (en) Query disambiguation
US7349896B2 (en) Query routing
US6199067B1 (en) System and method for generating personalized user profiles and for utilizing the generated user profiles to perform adaptive internet searches

Legal Events

Date Code Title Description
C06 Publication
C10 Request of examination as to substance
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1113836

Country of ref document: HK

C14 Granted
REG Reference to a national code

Ref country code: HK

Ref legal event code: GR

Ref document number: 1113836

Country of ref document: HK