CN101398856A - Method for acquiring navigation enquiry words, device and method for displaying searching result - Google Patents

Method for acquiring navigation enquiry words, device and method for displaying searching result Download PDF

Info

Publication number
CN101398856A
CN101398856A CNA2008102263006A CN200810226300A CN101398856A CN 101398856 A CN101398856 A CN 101398856A CN A2008102263006 A CNA2008102263006 A CN A2008102263006A CN 200810226300 A CN200810226300 A CN 200810226300A CN 101398856 A CN101398856 A CN 101398856A
Authority
CN
China
Prior art keywords
words
navigation
enquiry
text
navigation enquiry
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CNA2008102263006A
Other languages
Chinese (zh)
Inventor
王勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sogou Technology Development Co Ltd
Original Assignee
Beijing Sogou Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sogou Technology Development Co Ltd filed Critical Beijing Sogou Technology Development Co Ltd
Priority to CNA2008102263006A priority Critical patent/CN101398856A/en
Publication of CN101398856A publication Critical patent/CN101398856A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses a navigation query word acquisition method, and the method comprises: acquiring text-included data source; and analyzing the text in the relevant data source based on preset navigation keywords to acquire navigation query words. The invention also discloses a device for acquiring navigation query words, a method for displaying searching results and a searching engine system. By means of the invention, broader data source can be obtained, and the text in each relevant data source is analyzed by the preset navigation keywords, thereby comprehensively obtaining navigation query words as much as possible.

Description

Obtain the method for method, device and the displaying searching result of navigation enquiry words
Technical field
The present invention relates to the search engine technique field, particularly relate to the method for the method, device and the displaying searching result that obtain navigation enquiry words.
Background technology
Along with the rapid increase of website quantity on the internet, the user need arrive required website as early as possible by approach conveniently.For example, when utilizing search engine to inquire about, difference according to the user inquiring purpose roughly can be divided into navigation enquiry and information inquiry two classes, wherein, about navigation enquiry, user's direct purpose is known website of visit, for this class inquiry, the user always wishes that corresponding targeted website can appear at position earlier in the Search Results, or even first, so that find and enter this targeted website as early as possible.For example:
The input Ningxia People's Government, its target is www.nx.gov.cn/;
Input Motorola homepage, its target is www.motorola.com.cn/;
Import dream bookstore far away, its target is www.my285.com/.
Usually, search engine server can be according to the query word of user input, returns to the user after all related web pages are sorted according to the degree of correlation, selects for the user.But for navigation enquiry, there is and have only a webpage can satisfy user's demand, if this webpage can not come earlier position, the impression of harm users more seriously.Yet because internet data is abundant and numerous and diverse, therefore, search engine can not guarantee that the target web of navigation enquiry always comes former positions of Search Results.In order to address this problem, some search engines have been safeguarded a navigation enquiry vocabulary and target data set thereof, as shown in table 1, when the user inquiring speech hits certain navigation enquiry words in this navigation enquiry vocabulary, the target web of this navigation enquiry words correspondence will be come first of Search Results.
Table 1
Navigation enquiry words Target
Like the people hospital that loses weight www.aimin.com.cn/
Weicheng District Bureau of Education www.wcedu.net/
Hebei Netcom www.he.chinaunicom.com/
The method that prior art is set up the navigation enquiry vocabulary is, search engine logs by the user is found navigation enquiry words, promptly analyze the search log information, obtain under certain information inquiry speech, user's click frequency meets the network address of prerequisite, obtains descriptor at described network address according to the information inquiry speech, and is last according to network address that is obtained and corresponding descriptor, find navigation enquiry, and generate the navigation enquiry vocabulary.
But when utilizing the prior art to carry out the discovery of navigation enquiry words, some navigation enquiry words may be found, for example, the inquiry times of the navigation enquiry words that has is less, even is not inquired about as yet, then can't be found, also just can't be identified as navigation enquiry words by this method.
Summary of the invention
In view of this, the object of the present invention is to provide the method for the method, device and the displaying searching result that obtain navigation enquiry words, can't find the navigation enquiry word problem all sidedly to solve prior art.
For achieving the above object, the invention provides following scheme:
A kind of method of obtaining navigation enquiry words comprises:
Obtain the related data sources that comprises text;
The navigation keyword that utilization is preset is analyzed the text in the related data sources, obtains navigation enquiry words.
Preferably, described related data sources comprises each webpage in the internet; Text in the described related data sources comprises:
The literal that presents on web page title, text summary and the link text.
Preferably, described related data sources also comprises search engine logs, and the text in the described related data sources comprises the query word in the search engine logs.
Preferably, in advance described navigation keyword sets is woven to the regular expression of presetting rule; Describedly text in the described related data sources is analyzed, is obtained navigation enquiry words and comprise based on the navigation keyword that presets:
The punctuate identifier that utilization is preset is made pauses in reading unpunctuated ancient writings the text in the described related data sources, obtains short sentence;
In each short sentence, search the character string that is complementary with described regular expression;
The described character string that is complementary is defined as navigation enquiry words.
Preferably, describedly text in the described related data sources is analyzed, is obtained navigation enquiry words and comprise based on the navigation keyword that presets:
Judge and whether comprise described navigation keyword in the text in the described related data sources;
If comprise, the character string between first information separator before described navigation keyword and this navigation keyword is defined as navigation enquiry words.
Preferably, also comprise:
The navigation enquiry words that gets access to is filtered.
Preferably, the described navigation enquiry words that gets access to is filtered comprises:
Whether the number of times of judging the navigation enquiry words appearance that gets access to is less than preset threshold value;
If less than, filter out this navigation enquiry words.
Preferably, the described navigation enquiry words that gets access to is filtered comprises:
Judge whether the navigation enquiry words that gets access to is the filtration keyword that presets;
If filter out this navigation enquiry words.
Preferably, also comprise:
The navigation enquiry words that gets access to is sent at least two search engines as searching key word to be verified;
If first of each search engine search results is inconsistent, then with this navigation enquiry words filtering.
Preferably, if first unanimity of each search engine search results, then this navigation enquiry words also comprises by checking:
The primary network address of each search engine search results is defined as the target network address of this navigation enquiry words correspondence.
A kind of device that obtains navigation enquiry words comprises:
The data source acquiring unit is used to obtain the related data sources that comprises text;
The navigation enquiry words acquiring unit is used for utilizing the navigation keyword that presets that the text of related data sources is analyzed, and obtains navigation enquiry words.
Preferably, described related data sources comprises each webpage in the internet, and the text in the described related data sources comprises:
The literal that presents on web page title, text summary and the link text.
Preferably, described related data sources also comprises search engine logs, and the text in the described related data sources comprises:
Query word in the search engine logs.
Preferably, in advance described navigation keyword sets is woven to the regular expression of presetting rule; Described navigation enquiry words acquiring unit comprises:
The punctuate subelement is used for utilizing the punctuate identifier that presets that the text of described related data sources is made pauses in reading unpunctuated ancient writings, and obtains short sentence;
The coupling subelement is used for searching the character string that is complementary with described regular expression at each short sentence;
First determines subelement, is used for the described character string that is complementary is defined as navigation enquiry words.
Preferably, described navigation enquiry words acquiring unit comprises:
Judgment sub-unit is used for judging whether the text of described related data sources comprises described navigation keyword;
Second determines subelement, is used for the character string between first information separator before described navigation keyword and this navigation keyword is defined as navigation enquiry words.
Preferably, also comprise:
Filter element, the navigation enquiry words that is used for getting access to filters.
Preferably, described filter element comprises:
First judgment sub-unit, whether the occurrence number that is used to judge the navigation enquiry words that gets access to is less than preset threshold value;
The first filtering subelement is used for occurrence number is crossed filtering less than the navigation enquiry words of described preset threshold value.
Preferably, described filter element comprises:
Second judgment sub-unit is used to judge whether the navigation enquiry words that gets access to is the filtration keyword that presets;
The second filtering subelement is used for the navigation enquiry words filtering that will be the filtration keyword that presets.
Preferably, also comprise:
Authentication unit, the navigation enquiry words that is used for getting access to sends at least two search engines as searching key word and verifies;
The filtering unit is used for when each search engine search results first when inconsistent, with this navigation enquiry words filtering.
Preferably, if first unanimity of each search engine search results, then this navigation enquiry words also comprises by checking:
The network address determining unit is used for the primary network address of each search engine search results is defined as the target network address of this navigation enquiry words correspondence.
A kind of method of displaying searching result comprises:
Obtain the related data sources that comprises text;
Based on the navigation keyword that presets the text in the related data sources is analyzed, obtained navigation enquiry words;
The navigation enquiry words that obtains is sent at least two search engines search for, obtain the target network address of each navigation enquiry words correspondence;
Preserve described navigation enquiry words and corresponding target network address thereof, form navigation query database;
Receive the search content of user's input;
Inquire about described navigation query database, judge whether to exist the navigation enquiry words that is complementary with described search content;
If exist, at Search Results first of the target network address of the described navigation enquiry words correspondence that is complementary represented.
A kind of search engine system comprises:
Navigation query database is used to preserve described navigation enquiry words and corresponding target network address thereof; Described navigation query database is set up in the following manner: obtain the data source that includes text; Based on the navigation keyword that presets the text in the related data sources is analyzed, obtained navigation enquiry words; The navigation enquiry words that obtains is sent at least two search engines search for, obtain the target network address of each navigation enquiry words correspondence; Preserve described navigation enquiry words and corresponding target network address thereof, form navigation query database;
Interface module is used to receive the search content that the user imports;
Enquiry module is used to inquire about described navigation query database, judges whether to exist the navigation enquiry words that is complementary with described search content;
Present module, be used at Search Results first of the target network address of the described navigation enquiry words correspondence that is complementary represented.
According to specific embodiment provided by the invention, the invention discloses following technique effect:
The present invention obtains the related data sources that comprises text, and based on the navigation keyword that presets the text in the related data sources is analyzed, and obtains navigation enquiry words.Because described related data sources only need comprise text and get final product, therefore can be not limited to user's search daily record, thereby help obtaining wider Data Source, by the navigation keyword that presets the text on each webpage is analyzed again, can get access to navigation enquiry words as far as possible all sidedly.
Secondly, can obtain navigation enquiry words to the text analysis in the related data sources, realize simple based on the navigation keyword; Can utilize search engine that the navigation enquiry words that obtains is verified, guarantee the correctness of the navigation enquiry words that gets access to.
Description of drawings
Fig. 1 is the process flow diagram of the method that provides of the embodiment of the invention;
Fig. 2 is first schematic representation of apparatus that the embodiment of the invention provides;
Fig. 3 is second schematic representation of apparatus that the embodiment of the invention provides;
Fig. 4 is the 3rd schematic representation of apparatus that the embodiment of the invention provides;
Fig. 5 is the 4th schematic representation of apparatus that the embodiment of the invention provides;
Fig. 6 is the process flow diagram of the method that represents searching structure that provides of the embodiment of the invention;
Fig. 7 is the synoptic diagram of the search engine system that provides of the embodiment of the invention.
Embodiment
For above-mentioned purpose of the present invention, feature and advantage can be become apparent more, the present invention is further detailed explanation below in conjunction with the drawings and specific embodiments.
Referring to Fig. 1, the method for obtaining navigation enquiry words that the embodiment of the invention provides may further comprise the steps:
S101: obtain the related data sources that comprises text;
S102: based on the navigation keyword that presets the text in the related data sources is analyzed, obtained navigation enquiry words.
Wherein, as long as described related data sources comprises text, therefore can obtain Data Source very widely, for example, can be each webpage in the internet, and the text in the then described related data sources is the text that presents on each webpage.Can grasp and the text that presents on the webpage is obtained in the analysis of front end page elements by the page, include but not limited to the title, text summary, link text of webpage etc.Certainly, described related data sources can also comprise user's search engine logs, and at this moment, the text in the related data sources can also comprise the query word in the search engine logs.
Core of the present invention is, at the defective of prior art scheme, obtains navigation enquiry words all sidedly by obtaining the more data source.Be convenient and describe, below be that example describes in detail method of the present invention all with this data source of each webpage in the internet.
The present invention considers has some can have suffix such as " website ", " homepage " in the text that presents on each webpage in the internet, for example, and " Rockets's Chinese website ", " Sohu's homepage " or the like.With these speech is that the speech of suffix is likely navigation enquiry words, and the present invention is called the navigation suffix with such suffix, is the speech of suffix by excavating with the navigation suffix, can obtain a large amount of navigation enquiry words.Therefore, the present invention is based on that the navigation keyword that presets analyzes the text that presents on each webpage, described navigation keyword can be described navigation suffix, includes but not limited to " homepage ", " homepage ", " door ", " website ", " official website " or the like.
Wherein, based on the navigation keyword that presets the text that presents on described each webpage is analyzed, the method for obtaining navigation enquiry words has a lot, at length introduces below.
Embodiment one, can utilize the regular expression method to analyze.At first stipulate the suffix that navigates a series ofly, can include but not limited to previously described " homepage ", " homepage ", " door ", " website ", " official website " or the like, with regular expression it is organized then.Need to prove, regular expression is the formula that goes to mate a class character string with certain pattern, and this regular expression can be supported widely by various text edit softwares, class libraries (as the tools.h++ of Rogue Wave), wscript.exe (as awk/grep/sed).
Some navigation suffix may be " official website ", " Chinese network ", " personal website " etc., for example " official website of Asus ", in the embodiment of the invention, regard such navigation suffix as the complex navigation suffix, the suffix that is navigation enquiry words may be the combination of two even a plurality of navigation suffix, so the employed regular expression of the embodiment of the invention can be:
" ([.]+?) (and?: official | Chinese | the individual) * (?: homepage | homepage | the website | net | door | the official website)+"
This regular expression can mate with any literal initial, and zero or several " official " or " Chinese " or " individual " are contained in the centre, with the character string of " homepage " " homepage " " website " " net " " door " speech such as " official websites " ending.Form is more flexible, has stronger adaptability, can mate " official website of Asus " and " the refreshing Chinese network that rises in east " compound situation of a plurality of like this navigation suffix.
When mating by above-mentioned regular expression, need carry out at each independent sentence, and the text that presents in the common webpage may be sectional literal, therefore, before this regular expression of use mates, the processing of the text that presents in the webpage need being made pauses in reading unpunctuated ancient writings.Can carry out according to the punctuate identifier that presets during punctuate, wherein, the punctuate identifier can be space, punctuation mark, paragraph sign etc.At first, the whole text that scanning presents in the webpage just disconnects when above-mentioned identifier occurring, finally can obtain a series of short sentences that no longer comprise the punctuate identifier, just independent sentence.And then in each short sentence, search the character string that is complementary with above-mentioned regular expression, and last, just the character string that is complementary that obtains can be defined as navigation enquiry words.
Embodiment two, can directly utilize described navigation keyword that the text that presents on each webpage is analyzed, equally, described navigation keyword can include but not limited to " homepage ", " homepage ", " door ", " website ", " official website ", " official website ", " personal website ", " Chinese network ", or the like.Can at first judge whether comprise above-mentioned navigation keyword in the text that presents on each webpage,, then the character string between first information separator before navigate at this keyword and this navigation keyword can be defined as navigation enquiry words if exist.
Wherein said information separator can be space character, segmentation symbol, and some specific punctuates, and is as follows:
,。;—:、/.!-
For example, the text original text that presents on the webpage for " ... find after opening Sohu today; Sohu's homepage has increased the scroll box that represents Olympic Games latest news ... ", as seen, the described navigation keyword that occurs in this section words is " homepage ", so " Sohu " between ", " of " homepage " is preceding and front is defined as navigation enquiry words.
According to the method described above the text that presents on all webpages in the internet is all handled one time, can get access to a large amount of navigation enquiry words, but may comprise some noises in the navigation enquiry words that gets access to, for example, may " how at search dog " such character string be defined as navigation enquiry words, even also may " have ", " other " etc. be defined as navigation enquiry words, therefore, in a preferred embodiment of the invention, can also comprise the step that the navigation enquiry words that gets access to is filtered.The method of specifically filtering can be varied, can adopt following two kinds of preferred modes in the embodiment of the invention:
(1) because when obtaining navigation enquiry words in the text that all webpages from the internet present, what can duplicate unavoidably gets access to same navigation enquiry words, therefore, can utilize these characteristics as the foundation of filtering navigation enquiry words.Can carry out in such a way: the number of times that each navigation enquiry words that record gets access to occurs, if the number of times that certain navigation enquiry words occurs is less (for example, less than the threshold value that presets), then this navigation enquiry words can be regarded as the accidental noise that occurs, and with its filtering.
Wherein, can safeguard a parameter, be used for preserving the number of times of its appearance for the navigation enquiry words that newly gets access to; When getting access to this navigation enquiry words, all this parameter is added one at every turn, finally can obtain the total degree that each navigation enquiry words occurs.
(2) in actual applications, it is many really that some is confirmed as the number of times that the character string of navigation enquiry words may occur, but in fact these character strings still can not be regarded navigation enquiry words as, for example: and " having ", " other ", " some ", " my company ", or the like.For this situation, these speech can be set in advance to filtering keyword, if being above-mentioned these just, the navigation enquiry words that gets access to filters keyword, then this navigation enquiry words can be considered as noise, and with its filtering.
From above analysis as seen, the method of obtaining navigation enquiry words that the embodiment of the invention provides is based on that text analyzes, and navigation keyword by presetting, only need get final product text analysis, therefore, from search engine logs, obtain the method for navigation enquiry words, reduced requirement data source with respect to prior art, can obtain Data Source widely, help obtaining more all sidedly navigation enquiry words; On the other hand, the corresponding relation that need not equally to consider text and network address to prior art (for example, when prior art gets access to user's query word in search engine logs, need also to know that this user is behind this query word of input, finally selected which network address, can judge whether this query word is the navigation speech), therefore realize simple.
By above-mentioned filter method, can improve the accuracy of obtaining navigation enquiry words, but still might there be some noises in the navigation enquiry words after filtering, for this reason, in a preferred embodiment of the invention, can also comprise the step that the navigation enquiry words that gets access to is further verified: the navigation enquiry words that gets access to is sent to how tame search engine as searching key word search for, whether first of judging search engine search results be consistent, if it is consistent, prove its navigation enquiry words really, and the primary network address of each search engine search results can be defined as the network address of this navigation enquiry words correspondence.If first of each search engine search results is inconsistent, then regard this navigation enquiry words as noise, and with its filtering.
In order to understand the method for obtaining navigation enquiry words that the embodiment of the invention provides better, at length introduce below by concrete example.
For example, following several sections words are arranged on internet web page:
" find after opening Sohu today that Sohu's homepage has increased the scroll box that represents Olympic Games latest news, the online friend can understand the Olympic Games latest news very first time.”
" Chen Guan reopens uncommon website Shu Qi blog tide shop new product and is rushed to purchase "
" how at the lyrics of search dog website download song? the master-hand gives advice! Other website not all right.”
Can utilize the described regular expression of preamble that above-mentioned text is analyzed, at first, utilize punctuation mark, space that above-mentioned urtext is made pauses in reading unpunctuated ancient writings, can obtain following short sentence:
" find after opening Sohu today "
" Sohu's homepage has increased the scroll box that represents Olympic Games latest news "
" online friend can understand the Olympic Games latest news very first time "
" Chen Guan is uncommon, and the Shu Qi blog is reopened in the website "
" damp shop new product is rushed to purchase "
The lyrics of search dog website download song " how "
" master-hand gives advice "
" other website not all right "
Use described regular expression that each short sentence is mated, can obtain following navigation enquiry words:
" Sohu ", " Chen Guanxi ", " how at search dog ", " other ".
More than three navigation enquiry words all occurred once.Use the same method the text that presents on all webpages in the internet is all handled one time, and write down the number of times that each navigation enquiry words occurs.For example:
" Sohu " occurs 19824 times; " Chen Guanxi " occurs 5724 times; " how to remove search dog " and occur 2 times; " other " occurs 24586 times.
Wherein, the number of times that " how removing search dog " occurs is too low, therefore is counted as noise and by filtering; " other " is in the filtration keyword that presets, and therefore, also is counted as noise and by filtering.
Then, search " Sohu " on search dog and these two search engines of Baidu, first of Search Results all is www.sohu.com, thinks that then " Sohu " is navigation enquiry words, and its corresponding target network address is www.sohu.com.
Search " Chen Guanxi " on search dog and these two search engines of Baidu equally, first of the Search Results of search dog is ent.sina.com.cn/s/h/f/chengx.html; First of the Search Results of Baidu is yule.bai du.com/zt/star/yanzhanmen/.Both are inconsistent, thus both think that " Chen Guanxi " is not navigation enquiry words, and with its filtering.
Corresponding with the method for obtaining navigation enquiry words that the embodiment of the invention provides, the embodiment of the invention also provides a kind of device that obtains navigation enquiry words, and referring to Fig. 2, this device comprises with lower unit:
Data source acquiring unit U201 is used to obtain the related data sources that comprises text;
Navigation enquiry words acquiring unit U202 is used for based on the navigation keyword that presets the text of related data sources being analyzed, and obtains navigation enquiry words.
Data source acquiring unit U201 obtains the related data sources that comprises text; This data source can be each webpage in the internet, can also comprise user's search engine logs.Navigation enquiry words acquiring unit U202 analyzes the text in the related data sources based on the navigation keyword that presets, and obtains navigation enquiry words.Like this, the text in the related data sources as data source, is analyzed the text in the related data sources based on the navigation keyword that presets again, can be obtained navigation enquiry words as far as possible all sidedly.
Wherein, navigation enquiry words acquiring unit U202 can analyze the text in the related data sources in different ways, for example, can in advance the navigation keyword sets that presets be woven to regular expression, at this moment, referring to Fig. 3, navigation enquiry words acquiring unit U302 can comprise following subelement:
Punctuate subelement U3021 is used for utilizing the punctuate identifier that presets that the text of described related data sources is made pauses in reading unpunctuated ancient writings, and obtains short sentence;
Coupling subelement U3022 is used for searching the character string that is complementary with described regular expression at each short sentence;
First determines subelement U3023, is used for the described character string that is complementary is defined as navigation enquiry words.
Also can directly utilize the navigation keyword that presets that the text in the related data sources is analyzed, at this moment, referring to Fig. 4, navigation enquiry words acquiring unit U402 can comprise following subelement:
Judgment sub-unit U4021 is used for judging in the text of described related data sources whether comprise described navigation keyword;
Second determines subelement U4022, is used for the character string between first information separator before described navigation keyword and this navigation keyword is defined as navigation enquiry words.
Wherein, data source acquiring unit U301 among Fig. 3 and the data source acquiring unit U301 among Fig. 4 are identical with data source acquiring unit U201 among Fig. 2.
For the correctness of the navigation enquiry words that guarantees to obtain, can the navigation enquiry words that get access to be filtered, therefore, referring to Fig. 5, this device can also comprise:
Filter element U503, the navigation enquiry words that is used for getting access to filters.
Wherein, filter element U503 can adopt diverse ways to finish filtration to navigation enquiry words, for example, can filter based on frequency, and at this moment, filter element U503 can comprise following subelement:
The first judgment sub-unit U5031, whether the occurrence number that is used to judge the navigation enquiry words that gets access to is less than preset threshold value;
The first filtering subelement U5032 is used for occurrence number is crossed filtering less than the navigation enquiry words of described preset threshold value.
Also can filter based on the filtration keyword that presets, at this moment, filter element U503 can comprise following subelement:
The second judgment sub-unit U5033 is used to judge whether the navigation enquiry words that gets access to is the filtration keyword that presets;
The second filtering subelement U5034 is used for the navigation enquiry words filtering that will be the filtration keyword that presets.
Because when filtering based on frequency, some noise may filtering, therefore, in a preferred embodiment of the invention, can use two kinds of filter methods simultaneously, and therefore, as shown in Figure 3, filter element U503 can comprise above-mentioned four subelements simultaneously.
In actual applications, can also in search engine, verify that therefore, this device can also comprise to the navigation enquiry words that gets access to:
Authentication unit U504, the navigation enquiry words that is used for getting access to sends at least two search engines as searching key word and verifies;
Filtering unit U505 is used for when each search engine search results first when inconsistent, with this navigation enquiry words filtering.
Certainly, if first unanimity of each search engine search results, then this navigation enquiry words is by checking, and correct when thinking this navigation enquiry words, this device also comprises:
Network address determining unit U506 is used for the primary network address of each search engine search results is defined as the target network address of this navigation enquiry words correspondence.
Wherein, the data source acquiring unit U501 among Fig. 5, navigation enquiry words acquiring unit U502, U202 is identical with data source acquiring unit U201, navigation enquiry words acquiring unit among Fig. 2.
The foregoing description has been introduced method and the device that obtains navigation enquiry words, in actual applications can also be in the process of search engine displaying searching result, whether the search content of judging user's input is navigation enquiry words, if, then obtain the target network address of this navigation enquiry words correspondence, and this target network address first at Search Results represented.But, if judge in real time, need be behind the search content that receives user's input, at first obtain the related data sources that comprises text, text in the related data sources is analyzed, and obtain the target network address of navigation enquiry words, consider that the required time of this process may be long, the embodiment of the invention provides a kind of method of displaying searching result, in this method, navigation enquiry words and the corresponding target network address thereof obtained are preserved, form navigation query database, search engine can finish whether user's search content is the judgement of navigation search by inquiring about this database, has saved the time.Referring to Fig. 6, this method may further comprise the steps:
S601: obtain the related data sources that comprises text;
S602: based on the navigation keyword that presets the text in the related data sources is analyzed, obtained navigation enquiry words;
S603: the navigation enquiry words that obtains is sent at least two search engines search for, obtain the target network address of each navigation enquiry words correspondence;
S604: preserve described navigation enquiry words and corresponding target network address thereof, form navigation query database;
S605: the search content that receives user's input;
S606: inquire about described navigation query database, judge whether to exist the navigation enquiry words that is complementary with described search content;
S607: if exist, at Search Results first of the target network address of the described navigation enquiry words correspondence that is complementary represented, otherwise, the search content of user's input is handled as common information inquiry.
This method is compared with the previously described method of obtaining navigation enquiry words has increased step S603 to S607, and other part is all identical, and the concrete grammar that each step can adopt is also all identical, so content corresponding repeats no more here.
In the method for this displaying searching result, can be after getting access to navigation enquiry words, again the navigation enquiry words that obtains being sent at least two search engines searches for, if first unanimity of the Search Results that each search engine returns, then can be with first target network address of this Search Results as the navigation enquiry words correspondence, and navigation enquiry words and corresponding target network address thereof preserved, form navigation query database.Search engine only need load this navigation query database, just can work as the user in search engine during the inputted search content, directly by this navigation query database of inquiry, whether the search content of judging this user's input is navigation enquiry words, if then at Search Results first of the target network address of this navigation enquiry words correspondence can represented.
Corresponding with the method for this displaying searching result, the embodiment of the invention also provides a kind of search engine system, and referring to Fig. 7, this search engine system comprises with lower module:
Navigation query database U701 is used to preserve described navigation enquiry words and corresponding target network address thereof; Described navigation query database is set up in the following manner: obtain the related data sources that comprises text; Based on the navigation keyword that presets the text in the related data sources is analyzed, obtained navigation enquiry words; The navigation enquiry words that obtains is sent at least two search engines search for, obtain the target network address of each navigation enquiry words correspondence; Preserve described navigation enquiry words and corresponding target network address thereof, form navigation query database;
Interface module U702 is used to receive the search content that the user imports;
Enquiry module U703 is used to inquire about described navigation query database, judges whether to exist the navigation enquiry words that is complementary with described search content;
Present module U704, be used at Search Results first of the target network address of the described navigation enquiry words correspondence that is complementary represented.
More than to the method for obtaining method, device and the displaying searching result of navigation enquiry words provided by the present invention, be described in detail, used specific case herein principle of the present invention and embodiment are set forth, the explanation of above embodiment just is used for helping to understand method of the present invention and core concept thereof; Simultaneously, for one of ordinary skill in the art, according to thought of the present invention, part in specific embodiments and applications all can change.In sum, this description should not be construed as limitation of the present invention.

Claims (22)

1, a kind of method of obtaining navigation enquiry words is characterized in that, comprising:
Obtain the related data sources that comprises text;
The navigation keyword that utilization is preset is analyzed the text in the related data sources, obtains navigation enquiry words.
2, method according to claim 1 is characterized in that, described related data sources comprises each webpage in the internet; Text in the described related data sources comprises:
The literal that presents on web page title, text summary and the link text.
3, method according to claim 2 is characterized in that, described related data sources also comprises search engine logs, and the text in the described related data sources comprises the query word in the search engine logs.
4, method according to claim 1 is characterized in that, in advance described navigation keyword sets is woven to the regular expression of presetting rule; Describedly text in the described related data sources is analyzed, is obtained navigation enquiry words and comprise based on the navigation keyword that presets:
The punctuate identifier that utilization is preset is made pauses in reading unpunctuated ancient writings the text in the described related data sources, obtains short sentence;
In each short sentence, search the character string that is complementary with described regular expression;
The described character string that is complementary is defined as navigation enquiry words.
5, method according to claim 1 is characterized in that, describedly based on the navigation keyword that presets the text in the described related data sources is analyzed, and obtains navigation enquiry words and comprises:
Judge and whether comprise described navigation keyword in the text in the described related data sources;
If comprise, the character string between first information separator before described navigation keyword and this navigation keyword is defined as navigation enquiry words.
6, method according to claim 1 is characterized in that, also comprises:
The navigation enquiry words that gets access to is filtered.
7, method according to claim 6 is characterized in that, the described navigation enquiry words that gets access to is filtered comprises:
Whether the number of times of judging the navigation enquiry words appearance that gets access to is less than preset threshold value;
If less than, filter out this navigation enquiry words.
8, method according to claim 6 is characterized in that, the described navigation enquiry words that gets access to is filtered comprises:
Judge whether the navigation enquiry words that gets access to is the filtration keyword that presets;
If filter out this navigation enquiry words.
9, according to any described method of claim 1 to 8, it is characterized in that, also comprise:
The navigation enquiry words that gets access to is sent at least two search engines as searching key word to be verified;
If first of each search engine search results is inconsistent, then with this navigation enquiry words filtering.
10, method according to claim 9 is characterized in that, if first unanimity of each search engine search results, then this navigation enquiry words also comprises by checking:
The primary network address of each search engine search results is defined as the target network address of this navigation enquiry words correspondence.
11, a kind of device that obtains navigation enquiry words is characterized in that, comprising:
The data source acquiring unit is used to obtain the related data sources that comprises text;
The navigation enquiry words acquiring unit is used for utilizing the navigation keyword that presets that the text of related data sources is analyzed, and obtains navigation enquiry words.
12, device according to claim 11 is characterized in that, described related data sources comprises each webpage in the internet, and the text in the described related data sources comprises:
The literal that presents on web page title, text summary and the link text.
13, device according to claim 12 is characterized in that, described related data sources also comprises search engine logs, and the text in the described related data sources comprises:
Query word in the search engine logs.
14, device according to claim 11 is characterized in that, in advance described navigation keyword sets is woven to the regular expression of presetting rule; Described navigation enquiry words acquiring unit comprises:
The punctuate subelement is used for utilizing the punctuate identifier that presets that the text of described related data sources is made pauses in reading unpunctuated ancient writings, and obtains short sentence;
The coupling subelement is used for searching the character string that is complementary with described regular expression at each short sentence;
First determines subelement, is used for the described character string that is complementary is defined as navigation enquiry words.
15, device according to claim 11 is characterized in that, described navigation enquiry words acquiring unit comprises:
Judgment sub-unit is used for judging whether the text of described related data sources comprises described navigation keyword;
Second determines subelement, is used for the character string between first information separator before described navigation keyword and this navigation keyword is defined as navigation enquiry words.
16, device according to claim 11 is characterized in that, also comprises:
Filter element, the navigation enquiry words that is used for getting access to filters.
17, device according to claim 16 is characterized in that, described filter element comprises:
First judgment sub-unit, whether the occurrence number that is used to judge the navigation enquiry words that gets access to is less than preset threshold value;
The first filtering subelement is used for occurrence number is crossed filtering less than the navigation enquiry words of described preset threshold value.
18, device according to claim 16 is characterized in that, described filter element comprises:
Second judgment sub-unit is used to judge whether the navigation enquiry words that gets access to is the filtration keyword that presets;
The second filtering subelement is used for the navigation enquiry words filtering that will be the filtration keyword that presets.
19, according to any described device of claim 11 to 18, it is characterized in that, also comprise:
Authentication unit, the navigation enquiry words that is used for getting access to sends at least two search engines as searching key word and verifies;
The filtering unit is used for when each search engine search results first when inconsistent, with this navigation enquiry words filtering.
20, device according to claim 19 is characterized in that, if first unanimity of each search engine search results, then this navigation enquiry words also comprises by checking:
The network address determining unit is used for the primary network address of each search engine search results is defined as the target network address of this navigation enquiry words correspondence.
21, a kind of method of displaying searching result is characterized in that, comprising:
Obtain the related data sources that comprises text;
Based on the navigation keyword that presets the text in the related data sources is analyzed, obtained navigation enquiry words;
The navigation enquiry words that obtains is sent at least two search engines search for, obtain the target network address of each navigation enquiry words correspondence;
Preserve described navigation enquiry words and corresponding target network address thereof, form navigation query database;
Receive the search content of user's input;
Inquire about described navigation query database, judge whether to exist the navigation enquiry words that is complementary with described search content;
If exist, at Search Results first of the target network address of the described navigation enquiry words correspondence that is complementary represented.
22, a kind of search engine system is characterized in that, comprising:
Navigation query database is used to preserve described navigation enquiry words and corresponding target network address thereof; Described navigation query database is set up in the following manner: obtain the data source that includes text; Based on the navigation keyword that presets the text in the related data sources is analyzed, obtained navigation enquiry words; The navigation enquiry words that obtains is sent at least two search engines search for, obtain the target network address of each navigation enquiry words correspondence; Preserve described navigation enquiry words and corresponding target network address thereof, form navigation query database;
Interface module is used to receive the search content that the user imports;
Enquiry module is used to inquire about described navigation query database, judges whether to exist the navigation enquiry words that is complementary with described search content;
Present module, be used at Search Results first of the target network address of the described navigation enquiry words correspondence that is complementary represented.
CNA2008102263006A 2008-11-12 2008-11-12 Method for acquiring navigation enquiry words, device and method for displaying searching result Pending CN101398856A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNA2008102263006A CN101398856A (en) 2008-11-12 2008-11-12 Method for acquiring navigation enquiry words, device and method for displaying searching result

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNA2008102263006A CN101398856A (en) 2008-11-12 2008-11-12 Method for acquiring navigation enquiry words, device and method for displaying searching result

Publications (1)

Publication Number Publication Date
CN101398856A true CN101398856A (en) 2009-04-01

Family

ID=40517408

Family Applications (1)

Application Number Title Priority Date Filing Date
CNA2008102263006A Pending CN101398856A (en) 2008-11-12 2008-11-12 Method for acquiring navigation enquiry words, device and method for displaying searching result

Country Status (1)

Country Link
CN (1) CN101398856A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103136219A (en) * 2011-11-24 2013-06-05 北京百度网讯科技有限公司 Method and device for requirement mining and based on timeliness
CN103425742A (en) * 2013-07-16 2013-12-04 北京中科汇联信息技术有限公司 Method and device for searching website
CN105183905A (en) * 2015-09-30 2015-12-23 北京奇虎科技有限公司 Method and device for excavating query terms of official website
CN109885548A (en) * 2019-02-22 2019-06-14 网易(杭州)网络有限公司 Log inquiring method, device, storage medium and electronic device
CN112417248A (en) * 2020-11-24 2021-02-26 百度在线网络技术(北京)有限公司 Recommendation method, device, model, equipment and storage medium for addressing keywords

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103136219A (en) * 2011-11-24 2013-06-05 北京百度网讯科技有限公司 Method and device for requirement mining and based on timeliness
CN103136219B (en) * 2011-11-24 2016-08-17 北京百度网讯科技有限公司 A kind of based on ageing demand method for digging and device
CN103425742A (en) * 2013-07-16 2013-12-04 北京中科汇联信息技术有限公司 Method and device for searching website
CN105183905A (en) * 2015-09-30 2015-12-23 北京奇虎科技有限公司 Method and device for excavating query terms of official website
CN109885548A (en) * 2019-02-22 2019-06-14 网易(杭州)网络有限公司 Log inquiring method, device, storage medium and electronic device
CN112417248A (en) * 2020-11-24 2021-02-26 百度在线网络技术(北京)有限公司 Recommendation method, device, model, equipment and storage medium for addressing keywords

Similar Documents

Publication Publication Date Title
US7650330B1 (en) Information extraction from a database
US8655648B2 (en) Identifying topically-related phrases in a browsing sequence
US8180751B2 (en) Using an encyclopedia to build user profiles
CN103838798B (en) Page classifications system and page classifications method
WO2012054788A1 (en) Method and system for performing a comparison
AU2005203239A1 (en) Phrase-based indexing in an information retrieval system
JP2009525520A (en) Evaluation method for ranking and sorting electronic documents in search result list based on relevance, and database search engine
CN104391978B (en) Web page storage processing method and processing device for browser
CN102710795A (en) Hotspot collecting method and device
Chau et al. Web searching in Chinese: A study of a search engine in Hong Kong
KR20030016037A (en) Method for searching web page on popularity of visiting web pages and apparatus thereof
CN101398856A (en) Method for acquiring navigation enquiry words, device and method for displaying searching result
CN104899215A (en) Data processing method, recommendation source information organization, information recommendation method and information recommendation device
CN103116635A (en) Field-oriented method and system for collecting invisible web resources
CN102819384A (en) Method and device for prompting display at input field
CN106776937B (en) Method and device for determining inner-link keywords
WO2017000659A1 (en) Enriched uniform resource locator (url) identification method and apparatus
JP5423470B2 (en) Name identification check support device, name identification check support program, and name identification check support method
CN103150307B (en) The method and apparatus of the title relevant to descriptor is searched from network
Mahdabi et al. Report on the CLEF-IP 2011 Experiments: Exploring Patent Summarization.
US20090234838A1 (en) System, method, and/or apparatus for subset discovery
JP5286007B2 (en) Document search device, document search method, and document search program
US8117205B2 (en) Technique for enhancing a set of website bookmarks by finding related bookmarks based on a latent similarity metric
CN103246697A (en) Method and equipment for determining near-synonymy sequence clusters
JP5450135B2 (en) Retrieval modeling system and method using relevance dictionary

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20090401