CN101661490B - Search engine, client thereof and method for searching page - Google Patents

Search engine, client thereof and method for searching page Download PDF

Info

Publication number
CN101661490B
CN101661490B CN 200810213931 CN200810213931A CN101661490B CN 101661490 B CN101661490 B CN 101661490B CN 200810213931 CN200810213931 CN 200810213931 CN 200810213931 A CN200810213931 A CN 200810213931A CN 101661490 B CN101661490 B CN 101661490B
Authority
CN
China
Prior art keywords
page
abstract
user
word set
inquiry
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN 200810213931
Other languages
Chinese (zh)
Other versions
CN101661490A (en
Inventor
张小洵
郭志立
郭宏蕾
祝慧佳
苏中
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to CN 200810213931 priority Critical patent/CN101661490B/en
Publication of CN101661490A publication Critical patent/CN101661490A/en
Application granted granted Critical
Publication of CN101661490B publication Critical patent/CN101661490B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to a search engine, a client thereof and a method for searching pages. The search engine comprises an inquiry device, a page abstract extraction device and a page abstract selection device, wherein the inquiry device is configured to search page sequences which meet an inquiry; the page abstract extraction device is configured to extract candidate page abstracts of at least one page in the page sequences; and the page abstract selection device is configured to select a candidate page abstract in the select candidate page abstracts according to a word set which is related to a user who sends the inquiry and the candidate page abstract is used as a page abstract supplied to the user. The invention selects the final page abstract from the candidate page abstracts according to individualizing key words which reflect the information demand of the use but not simply selects text segments containing the inquiring key words to form the page abstract, thereby meeting the individualizing information demand of the use to a certain degree.

Description

The method of search engine, its client and search and webpage
Technical field
The present invention relates to search engine technique, relate in particular to the retrieval of page abstract (snippet) relevant with webpage in the Query Result of search engine.
Background technology
Along with the development of Internet service, for example the search engine of Google, Yahoo, MSN almost becomes the indispensable instrument of interested Internet resources (for example webpage) that it is found that.Search engine is worked usually in the following manner: in case the user has submitted inquiry to by client, search engine will return the webpage that searches by search results pages to the user.The webpage that searches is relevant with inquiry.Except the title and Uniform Resource Identifier (URL) of webpage, search results pages also comprises the short text relevant with webpage and describes.
This short text is described and is commonly called page abstract.Search engine extracts page abstract by extracting and make up the text chunk that comprises the related keyword of inquiry usually from webpage.In search results pages, search engine can by such as highlighted demonstration, underline, the various means of different fonts etc. make the demonstration of the searching keyword in the page abstract be different from other text, with the notice that attracts the user and be beneficial to the user and determine whether to click this webpage.
Except searching keyword, the user may be in information requirement, and aspects such as personal interest, search intention and target there are differences.Although page abstract can reflect the relevant of webpage and inquiry to a certain extent, because present page abstract is made of the text chunk that comprises searching keyword, the content except keyword in the text chunk is not considered in the selection of text chunk.
Therefore, require further improvement search technique, to satisfy at least to a certain extent the different information requirements of different people.
Summary of the invention
An object of the present invention is to provide the client of a kind of search engine, search engine and the method for search and webpage, thereby provide the Extraordinary page abstract for the user.
In one embodiment of the invention, search engine comprises: inquiry unit is configured to retrieve the page sequence that satisfies inquiry; The page abstract extraction element is configured to extract the candidate page summary of the webpage of at least one in the described page sequence; The page abstract selecting arrangement is configured to according to selecting the candidate page in the described candidate page summary to make a summary with the user-dependent word set of sending described inquiry, as the page abstract that offers described user; With generation device as a result, be configured to produce the Query Result that comprises described page sequence and described page abstract.
In an optional embodiment, search engine can comprise the word set generation device, is configured to the information of information requirement according to the described user of reflection, obtains described user's at least one personalized keyword, to form described word set.
In one embodiment of the invention, the client of search engine comprises: receiving trap, be configured to receive the page sequence that retrieves from search engine, and the candidate page of the webpage of at least one is made a summary in the described page sequence; The page abstract selecting arrangement, be configured to according to selecting the candidate page in the described candidate page summary to make a summary with the user-dependent word set of described client, as the page abstract that offers described user, with generation device as a result, be configured to produce the Query Result that comprises described page sequence and described page abstract.
In an optional embodiment, client can comprise the word set generation device, is configured to the information of information requirement according to the described user of reflection, obtains described user's at least one personalized keyword, to form described word set.
In one embodiment of the invention, the method for search and webpage comprises: retrieve the page sequence that satisfies inquiry; Extract the candidate page summary of the webpage of at least one in the described page sequence; According to selecting the candidate page in the described candidate page summary to make a summary with the user-dependent word set of sending described inquiry, as the page abstract that offers described user; Comprise the Query Result of described page sequence and described page abstract with generation.
In one embodiment of the invention, the method for search and webpage comprises: receive the page sequence that responds inquiry and retrieve from search engine, and the candidate page of the webpage of at least one is made a summary in the described page sequence; According to selecting the candidate page in the described candidate page summary to make a summary with the user-dependent word set of sending described inquiry, as the page abstract that offers described user; Comprise the Query Result of described page sequence and described page abstract with generation.
In an embodiment of the present invention, owing to be not to select simply and make up the text chunk that comprises searching keyword to form page abstract, but come from the candidate page summary, to select final page abstract according to the personalized keyword of the information requirement that reflects the user, thereby can satisfy to a certain extent user's customized information demand.
Description of drawings
With reference to below in conjunction with the explanation of accompanying drawing to the embodiment of the invention, can understand more easily above and other purpose of the present invention, characteristics and advantage.In the accompanying drawings, technical characterictic or parts identical or correspondence will adopt identical or corresponding Reference numeral to represent.
Fig. 1 is the block diagram that the general structure of search engine is shown.
Fig. 2 is the block diagram of general structure that the client of search engine is shown.
Fig. 3 illustrates the according to an embodiment of the invention block diagram of the structure of search engine.
Fig. 4 is the exemplary process diagram that the method for the search and webpage of carrying out in the search engine shown in Figure 3 is shown.
Fig. 5 A and 5B show an example of the inquiry of method processing shown in Figure 4.
Fig. 6 is the block diagram that illustrates based on the structure of the search engine of another embodiment of the present invention.
Fig. 7 is the block diagram that illustrates based on the structure of the client of another embodiment of the present invention.
Fig. 8 is the exemplary process diagram that the method for the search and webpage of carrying out in the client shown in Figure 7 is shown.
Fig. 9 is the block diagram that the example arrangement that wherein realizes computing machine of the present invention is shown.
Embodiment
Embodiments of the invention are described with reference to the accompanying drawings.Should be noted that in order to know purpose, omitted expression and the description of parts that have nothing to do with the present invention, known to persons of ordinary skill in the art and processing in accompanying drawing and the explanation.
Before describing embodiments of the invention, the general structure of search engine and client is described and will help to understand the present invention.
Fig. 1 is the block diagram that schematically shows the general structure of search engine.As shown in Figure 1, the general inclusion information collector 101 of search engine, database 102, indexing unit 103 and requestor 104.Information search device 101 is responsible for roaming in the internet, finds and the collection info web, and info web is stored in the database 102.Indexing unit 103 is responsible for understanding the info web that information search device 101 was collected, and web page contents is analyzed, and web page contents is carried out marked index and stores in the database 102.The inquiry that requestor 104 receives from the user retrieves the webpage that satisfies inquiry from database 102, and Query Result is returned to the user.
Requestor 104 generally comprises inquiry unit 110, page abstract extraction element 111 and search-engine results generation device 112.Inquiry unit 110 sorts to result for retrieval according to the query and search database of user's submission and according to the degree of correlation, with webpage (being the address of the webpage) sequence that is met inquiry.For each webpage in the page sequence, page abstract extraction element 111 extracts a page abstract from the corresponding web page content of database 102.Search-engine results generation device 112 becomes Query Result with page sequence with the relevant Content Organizings such as page abstract, for example produces HTML (Hypertext Markup Language) (HTML) page, to feed back to inquiring user.
Page abstract extraction element 111 is for each keyword (except the keyword of getting rid of in the inquiry) in the inquiry, from web page contents, extract with respect to the text in the nearby sphere of the appearance position of this keyword, and will be combined into page abstract for the group of text of each keyword extraction.Occur if keyword has in web page contents repeatedly, page abstract extraction element 111 can select text chunk to form page abstract with heuristic rule, summary technology or stochastic sampling simply.Heuristic rule refers to rule of thumb or infers the rule of the selection text chunk of determining.For example, based on the text chunk that comprises keyword that occurs first in the web page contents often more importantly experience or hypothesis, heuristic rule can be to select the text chunk that comprises keyword that occurs first in the web page contents.The summary technology is the technology of the main contents that a kind of extraction can document.Utilizing this technology to extract text chunk is exactly the main contents of selecting can reflect web page contents, namely comprises the most important text chunk of keyword.Known have various technology to determine that text chunk is with respect to the importance of web page contents.For example can obtain each word of text chunk with respect to the weight of web page contents by word frequency-reverse document (TF-IDF) method (back can specifically describe), in the text chunk weight of all words and then be the weight of text section.The weight of text chunk is larger, and then importance is just higher.
Fig. 2 is the block diagram of general structure that schematically shows the client 200 of search engine.As shown in Figure 2, client 200 comprises user interface 201, client dispensing device 202 and client device 203.User interface 201 receives the inquiry of user's input, presents Query Result to the user, and the browse operation of process user (for example page turning, rolling etc.).The inquiry that client dispensing device 202 will be input to user interface 201 sends to search engine.Client device 203 receives Query Result to present to the user by user interface 201 from search engine.
Fig. 3 illustrates the according to an embodiment of the invention block diagram of the structure of search engine.As shown in Figure 3, search engine inclusion information collector 301, database 302, indexing unit 303 and requestor 304.Requestor 304 comprises inquiry unit 310, page abstract extraction element 311, search-engine results generation device 312 and search engine page abstract selecting arrangement 313.Information search device 301, database 302, indexing unit 303 and inquiry unit 310 can be identical with inquiry unit 110 with information search device 101, database 102, indexing unit 103 shown in Figure 1, therefore no longer repeat specification here.
Different from page abstract extraction element 111 shown in Figure 1, for each webpage in the page sequence, page abstract extraction element 311 is not to extract a page abstract from the corresponding web page content of database 302, but extracts all candidates' page abstract.Page abstract extraction element 311 extracts the contiguous text of keyword and makes up the method for being close to text from web page contents can be identical with the page abstract extraction element of prior art.The institute that for example, can find out the keyword (except the keyword of getting rid of in the inquiry) that relates in the inquiry in web page contents occurs.Occur for each, can from the position of this appearance, from web page contents, extract the text chunk that comprises this appearance according to preset extraction rule.Preset extraction rule for example can comprise:
● length constraint: the length of text chunk is no more than the predetermined upper limit;
● integrity constraint: text chunk remains complete sentence as far as possible;
● the total length constraint: the extraction of text chunk will consider that the length after the combination is no more than the predetermined upper limit;
● avoid repetition: for the different text chunks that comprise at least two different keywords of taking from a part, get the same text section as far as possible.For example, for sentence " The D300 is designatedby Nikon as the ultimate in DX format performance ", if extract text chunk for keyword " D300 " and " format ", may extract text chunk " The D300is designated by Nikon as the ultimate " and " by Nikon as the ultimatein DX format performance ".And, then for example should extract " The D300 is designated by Nikon as the ultimate in DX formatperformance " according to avoiding the repetition principle.If limited length, available suspension points replaces irrelevant part.
● Else Rule and these regular combination in any.
Page abstract extraction element 311 obtains all combinations of these text chunks according to the predetermined combinations rule.The predetermined combinations rule can comprise: the kind of the keyword that comprises in the combination is more, then can more preferably consider this combination; Avoid comprising repetition or overlapping text chunk; Etc..Alternatively, page abstract extraction element 311 can be had to according to predetermined policy the combination of part.For example obtain the more combination of species number of the keyword comprise, obtain the combination of predetermined ratio in all combinations, obtain the combination of part at random, etc.Page abstract extraction element 311 is not only to select a combination as page abstract from the combination that obtains, but exports the combination of all acquisitions, to make a summary as candidate page.
Alternatively, page abstract extraction element 311 needn't extract for each webpage in the page sequence candidate page summary, but can extract the candidate page summary for the part webpage of selecting according to predetermined policy.For example, can for the webpage of the webpage of front, predetermined ratio, webpage that the degree of correlation surpasses predetermined threshold or or even the webpage determined at random extract.
Search engine page abstract selecting arrangement 313 bases and the user-dependent word set of sending inquiry from the candidate page summary of page abstract extraction element 311 outputs, are selected as the page abstract that offers the user.Word in the word set has reflected user's customized information demand, thereby can determine whether whether the content of candidate page summary partly or entirely satisfies user's customized information demand, namely comprise the word in the word set according to the word in the word set.Can determine according to the whole bag of tricks the degree of candidate page summary reflection customized information demand.For example, can be with the species number of the word in the word set that occurs in the candidate page summary tolerance as degree.In this case, the species number of the word in the word set that a certain candidate page summary comprises is larger, and then personalized degree is higher.In another example, can preset weight for the word in the word set, be that word in the word set assigns weight according to predetermined criterion perhaps.As an example that assigns weight according to predetermined criterion, information according to the information requirement that reflects the user obtains user's personalized keyword to form word set (detailed description sees below), can obtain the frequency of occurrences of word in described information in the word set, and distribute the weight that is complementary with its frequency of occurrences for the word in the word set.In this case, weight and larger, then personalized degree is higher.Can select to reflect the candidate page summary that personalized degree is the highest, or select at random in the higher some candidate page summaries of the personalized degree of reflection one, or select the personalized degree of reflection to be higher than first candidate page summary of predetermined threshold, or select the personalized degree of reflection higher and determine more can reflect the candidate page summary of web page contents according to the summary technology, etc.Those of ordinary skill in the art understands, and the present invention can also adopt alternate manner to select the page abstract in the candidate page summary.
For the species number of the word in the word set that obtains to occur in the candidate page summary, one skilled in the art can realize by the whole bag of tricks.For example, can scan the word that each candidate page summary comprises and whether be comprised in the word set, thereby count species number, and then relatively for the species number of all candidate page summary statistics.Perhaps, can adopt following more formal method.
In the method, dictionary of model comprises a plurality of words of arranging by predefined procedure.This dictionary can be the complete or collected works of all possible word, also can be such complete or collected works' subset.In the latter case, word set will not comprise non-existent word in the dictionary.Above-mentioned word set is configured to a word set vector.Each element in this word set vector all from dictionary in a different word corresponding, and with for example 1 representing that word set comprises corresponding word, with for example 0 representing that word set does not comprise corresponding word.Constitute corresponding candidate page summary vector by similar mode for each.Each element in the candidate page summary vector all with dictionary in a different word corresponding, and with for example 1 representing that candidate page is made a summary and comprise corresponding word, usefulness for example 0 represents that candidate page is made a summary and does not comprise corresponding word.The number of element is identical with the number of word in the dictionary in the word set vector sum candidate page summary vector.
Calculate the correlativity between the word set vector sum candidate page summary vector.For example can calculate the degree of correlation with the cosine distance c osine between vector or overlap distance overlap, formula is defined as follows respectively:
Co sin e : sim ( x , y ) = dot ( x , y ) | x | · | y |
Overlap : sim ( x , y ) = dot ( x , y ) min ( | x | , | y | )
Here, x and y represent respectively word set vector sum candidate page summary vector, | x| and | y| represents respectively the nonzero element number that comprises among vector x and the y, the inner product of dot (x, y) expression vector x and y.The degree of correlation is higher, shows that then the species number of the word in the word set that occurs in the candidate page summary is more.Thereby can carry out above-mentioned selection by the degree of correlation relatively.
Above-mentioned word set can include the customized information demand that reflects the user, the keyword of aspects such as personal interest, search intention and target.Word set can be artificial the setting, for example selects to set by importing file or interactive mode, also can produce by automation equipment (back can be described in detail).Search engine can be by the whole bag of tricks identification and user-dependent word set.For example, need to be in the situation of search engine registration and login the user, can when the user registers, import, from client or produce relevant word set, and when the user logins, determine corresponding word set according to user identity, even can after login, make amendment, import or produce by the user operation of word set.Perhaps, can from user's client collect the user identity information (such as by COOKIE, call ACTIVEX control, download and operation APPLET etc.), and the word set that in the situation of users identity, produces word set, importing or produced from client.Perhaps, can the plug-in unit that be used for importing, transmitting to search engine word set or generation word set be installed in client.
Search-engine results generation device 312 produces the Query Result that comprises page sequence and selected candidate page summary.For example, Query Result can be organized into the tabulation of the as a result unit that the candidate page summary by the information (for example title, address etc.) of the webpage in the page sequence and corresponding selection consists of.
Alternatively, perhaps in the situation that page abstract extraction element 311 extracts for the part webpage of page sequence, page abstract extraction element 311 can further produce the page abstract of prior art, and is replaced the page abstract of corresponding prior art by the selected candidate page summary of search-engine results generation device 312 usefulness.
Describe the method for the search and webpage of carrying out in the search engine shown in Fig. 3 below in conjunction with Fig. 4, wherein omitted the processing that has nothing to do with the present invention.
As shown in Figure 4, method is from step 400.In step 402, in response to the inquiry that receives the user, inquiry unit 310 retrieves the page sequence that satisfies inquiry.Then in step 404, page abstract extraction element 311 extracts the candidate page summary of the webpage of at least one in the page sequence.Then in step 406, search engine page abstract selecting arrangement 313 is for above-mentioned at least one webpage, with contain in the page selection face summary that extracts with user-dependent word set in the maximum candidate page summary of word, be chosen as the page abstract that offers the user.Then in step 408, search-engine results generation device 312 produces the Query Result that comprises described page sequence and described page abstract.Method then finishes in step 410.
Fig. 5 A and 5B show an example of the inquiry of method processing shown in Figure 4.Fig. 5 A shows the inquiry " Nikon D300 " of inputting for the user in step 402, and (" Nikon " is a camera brand, " D300 " is a camera model) content of a webpage in the page sequence that searches, wherein inquiry comprises two searching keywords " Nikon " and " D300 ".Suppose that word set comprises word set keyword " format " (camera lens standard) and " battery " (battery).In the content shown in Fig. 5 A, paragraph 501 comprises searching keyword " Nikon " and " D300 ".Paragraph 502 comprises searching keyword " D300 " and word set keyword " format ".Paragraph 503 does not comprise searching keyword " Nikon " and " D300 ", but comprises word set keyword " format ".Paragraph 504 comprises searching keyword " D300 " and word set keyword " battery ".Paragraph 505 does not comprise searching keyword " Nikon " and " D300 ", but comprises word set keyword " battery ".In step 404, extract text chunk " The Nikon D300is a 12.3-megapixel professional digital single-lens-reflex (dSLR) camerathat Nikon Corporation announced on 23 August 2007 ", " The D300 isdesignated by Nikon as the ultimate in DX format performance " and " TheMB-D10 allows the D300 to be powered by an additional EN-EL3e batteryor AA batteries ".In step 406, the combination that obtains " The Nikon D300 is a12.3-megapixel professional digital single-lens-reflex (dSLR) camera thatNikon Corporation announced on 23 August 2007.The D300 is designatedby Nikon as the ultimate in DX format performance.The MB-D10 allowsthe D300 to be powered by an additional EN-EL3e battery or AAbatteries " comprises word set keyword " format " and " battery ", the word set keyword that namely comprises is maximum, so selected.In step 408, produce the accordingly result unit shown in Fig. 5 B.
Get back to Fig. 3, preferably, search engine can comprise word set generation device (not shown).The word set generation device is according to the information of the described user's of reflection information requirement, and the personalized keyword of at least one of acquisition user is to form word set.The information that reflects described user's information requirement refer to reflect user for example once, now or in the future to browse, want the information of the content browsed, the webpage of the inquiry of for example carrying out, the webpage of browsing, collection or document etc.From these information, can collect user's customized information demand.Can carry out such collection by the known technology in personalized retrieval field, " Exploring Folksonomy for PersonalizedSearch " such as people such as Shengliang Xu, SIGIR ' 08, July20-24,2008, disclosed technology among the Singapore, word frequency-reverse document (TF-IDF) etc.Can from search engine or client (such as by COOKIE, call ACTIVEX control, download and operation APPLET etc.) collect such information.As a concrete example, the word set generation device can be by the information acquisition word set of following TF-IDF method according to reflection user personalized information demand.
By the TF-IDF method, can assess a word for the significance level of certain document in the information (for example document sets) of reflection user personalized information demand, its formula is as follows:
tf i . j = n i , j Σ k n k , j
Molecule n wherein I, jExpression word t iAt document d jThe number of times of middle appearance, and denominator Σ kn K, jExpression document d jIn the number of times sum that occurs of all words,
idf i = log ( D ) | { d : t i ∈ d } |
Wherein | D| represents the total number of documents in the document sets, and | { d:t i∈ d}| represents to comprise word t iThe document number.
Last word t iWith respect to document d jWeight w I, jFor
w i,j=tf i,j·idf i
In keyword abstraction, with weight w in one piece of document I, jThose larger words are as the keyword of document.All keywords that obtain form word set.In the situation of using as previously mentioned dictionary, word set will not comprise non-existent word in the dictionary.
Preferably, search engine can comprise the identity device (not shown).Identity device can identify the appearance in selected candidate page summary of any word in the word set, for example by highlighted demonstration, underline, different fonts etc. means.Client can present accordingly according to such sign.
In the above-described embodiments, can bring in by client shown in Figure 2 and finish inquiry.
Although among the embodiment in front generation device as a result is described as being included in the search engine, yet generation device also can be contained in the client as a result.Under these circumstances, search engine can return the response inquiry and the page sequence that retrieves and the page abstract of selecting for the webpage in the page sequence to client.In client, received the page abstract of page sequence and selection by the client device, and produce Query Result and offer user interface by generation device as a result and present.
Alternatively, directly import or produce word set in client, and by the client dispensing device on suitable opportunity, for example user registration, sign in to the control of search engine, response search engine, when transmission inquires search engine, in response user's control etc. the situation, the word set of this locality is sent to search engine.
Preferably, the word set generation device can be contained in the client, rather than in search engine.Under these circumstances, can be on suitable opportunity, for example in response user's control, situation that the user starts client etc., start the word set generation device and come to form word set according to the information of the reflection user's of this locality information requirement.
Correspondingly, have in client in the situation of word set, identity device can be contained in the client, rather than in search engine.Under these circumstances, before Query Result was provided for user interface, then identity device identified.
Fig. 6 is the block diagram that illustrates based on the search engine of another embodiment of the present invention.Fig. 7 is the block diagram that illustrates based on the client of another embodiment of the present invention.
As shown in Figure 6, search engine inclusion information collector 601, database 602, indexing unit 603 and requestor 604.Requestor 604 comprises inquiry unit 610, page abstract extraction element 611 and search engine dispensing device 614.Information search device 601, database 602, indexing unit 603, inquiry unit 610 and page abstract extraction element 611 can be identical with page abstract extraction element 311 with information search device 301, database 302, indexing unit 303, inquiry unit 310 shown in Figure 3, therefore no longer repeat specification here.Search engine dispensing device 614 sends page sequence and the selected candidate page summary that extracts to the user's who sends inquiry client.
As shown in Figure 7, client 700 comprises as a result generation device 705 of user interface 701, client dispensing device 702, client device 703, customer terminal webpage summary selecting arrangement 704 and client.User interface 201, the client dispensing device 202 with shown in Figure 2 is identical respectively with client dispensing device 702 for user interface 701, no longer is repeated in this description here.In Fig. 7 and since client dispensing device 703 and user interface 701 with solve the present invention institute for technical matters have nothing to do, so the with dashed lines frame represents.Client device 703 receives the page sequence that retrieves from search engine, and the candidate page of the webpage in page sequence summary.The candidate page summary similar with search engine page abstract selecting arrangement 313, that customer terminal webpage summary selecting arrangement 704 is maximum with the word in the user-dependent word set that contains in the candidate page summary with client is chosen as the page abstract that offers the user.Similar with search-engine results generation device 312, as a result generation device 705 generations of client comprise the Query Result of page sequence and selected page abstract, but offer user interface 701.
Describe the method for the search and webpage of carrying out in the client shown in Fig. 7 below in conjunction with Fig. 8, wherein omitted the processing that has nothing to do with the present invention.
As shown in Figure 8, method is from step 800.In step 802, client device 703 receives the page sequence that responds inquiry and retrieve from search engine shown in Figure 6, and the candidate page of the webpage of at least one is made a summary in the page sequence.Then in step 804, customer terminal webpage summary selecting arrangement 704 is the user of containing in the candidate page summary with client, namely sends the maximum candidate page summary of word in the user-dependent word set of inquiry, as the page abstract that offers the user.Then in step 806, client as a result generation device 705 produces the Query Result that comprises described page sequence and described page abstract.Method then finishes in step 808.
Preferably, client 700 shown in Figure 7 can comprise the identity device (not shown) identical with previously described identity device.
Can be by directly importing or produce word set in client with previously described similar mode.Preferably, client 700 can comprise the word set generation device identical with previously described word set generation device.Under these circumstances, can be on suitable opportunity, for example in response user's control, situation that the user starts client etc., start the word set generation device and come to form word set according to the information of the reflection user's of this locality information requirement.
Although should be noted that among the embodiment of front search engine and client are described as having server and client side's architecture, yet the present invention is not limited to this architecture.For example, search engine can be implemented in main frame, and the user visits the query function that main frame provides by terminal.Perhaps, search engine and client functionality can be integrated.
In addition, should also be noted that above-mentioned series of processes and device also can pass through hardware, software and firmware and realize.In situation about realizing by software or firmware, from storage medium or network to the computing machine with specialized hardware structure, for example multi-purpose computer 800 shown in Figure 8 is installed the program that consists of this software, and this computing machine can be carried out various functions etc. when various program is installed.
The realization environment of system of the present invention, unit and method as shown in Figure 8.
In Fig. 9, CPU (central processing unit) (CPU) 901 is carried out various processing according to the program of storage in the read-only mapping (enum) data (ROM) 902 or from the program that storage area 908 is loaded into random access mapping (enum) data (RAM) 903.In RAM903, also store as required data required when CPU901 carries out various processing etc.
CPU901, ROM902 and RAM903 are connected to each other via bus 904.Input/output interface 905 also is connected to bus 904.
Following parts are connected to input/output interface 905: importation 906 comprises keyboard, mouse etc.; Output 907 comprises display, such as cathode-ray tube (CRT) (CRT), liquid crystal display (LCD) etc., and loudspeaker etc.; Storage area 908 comprises hard disk etc.; With communications portion 909, comprise that network interface unit is such as LAN card, modulator-demodular unit etc.Communications portion 909 is processed such as the Internet executive communication via network.
As required, driver 910 also is connected to input/output interface 905.Detachable media 911 is installed on the driver 910 as required such as disk, CD, magneto-optic disk, semiconductor mapping (enum) data etc., so that the computer program of therefrom reading is installed in the storage area 908 as required.
Realizing by software in the situation of above-mentioned steps and processing, such as detachable media 911 program that consists of software is being installed such as the Internet or storage medium from network.
It will be understood by those of skill in the art that this storage medium is not limited to shown in Figure 9 wherein has program stored therein, distributes separately to provide the detachable media 911 of program to the user with equipment.The example of detachable media 911 comprises disk, CD (comprising the read-only mapping (enum) data of CD (CD-ROM) and digital universal disc (DVD)), magneto-optic disk and (comprises mini-disk (MD) and semiconductor mapping (enum) data.Perhaps, storage medium can be hard disk that comprises in ROM902, the storage area 908 etc., computer program stored wherein, and be distributed to the user with the equipment that comprises them.
With reference to specific embodiment the present invention has been described in the instructions in front.Yet those of ordinary skill in the art understands, and can carry out various modifications and change under the prerequisite that does not depart from the scope of the present invention that limits such as claims.

Claims (8)

1. search engine device comprises:
Inquiry unit is configured to retrieve the page sequence that satisfies inquiry;
The page abstract extraction element is configured to extract the candidate page summary of the webpage of at least one in the described page sequence;
The page abstract selecting arrangement is configured to according to selecting page abstract in the described candidate page summary with the user-dependent word set of sending described inquiry, as the page abstract that offers described user;
Generation device is configured to produce the Query Result that comprises described page sequence and described page abstract as a result; With
The word set generation device is configured to the information of information requirement according to the described user of reflection, obtains described user's at least one personalized keyword, to form described word set.
2. search engine device as claimed in claim 1 also comprises:
Identity device is configured to identify the appearance of any word in described page abstract in the described word set.
3. the client of a search engine comprises:
Receiving trap is configured to receive the page sequence that retrieves from search engine, and the candidate page of the webpage of at least one is made a summary in the described page sequence;
The page abstract selecting arrangement is configured to according to selecting page abstract in the described candidate page summary with the user-dependent word set of described client, as the page abstract that offers described user;
Generation device is configured to produce the Query Result that comprises described page sequence and described page abstract as a result; With
The word set generation device is configured to the information of information requirement according to the described user of reflection, obtains described user's at least one personalized keyword, to form described word set.
4. client as claimed in claim 3 also comprises:
Identity device, be configured to identify with described word set in the appearance of any word in described page abstract.
5. the method for a search and webpage comprises:
Retrieve the page sequence that satisfies inquiry;
Extract the candidate page summary of the webpage of at least one in the described page sequence;
According to selecting page abstract in the described candidate page summary with the user-dependent word set of sending described inquiry, as the page abstract that offers described user; With
Generation comprises the Query Result of described page sequence and described page abstract,
Described method also comprises:
According to the described user's of reflection the information of information requirement, obtain described user's at least one personalized keyword, to form described word set.
6. the method for search and webpage as claimed in claim 5 also comprises:
The appearance of any word in described page abstract in sign and the described word set.
7. the method for a search and webpage comprises:
Receive the page sequence that responds inquiry and retrieve from search engine, and the candidate page of the webpage of at least one is made a summary in the described page sequence;
According to selecting page abstract in the described candidate page summary with the user-dependent word set of sending described inquiry, as the page abstract that offers described user; With
Generation comprises the Query Result of described page sequence and described page abstract,
Described method also comprises:
According to the described user's of reflection the information of information requirement, obtain described user's at least one personalized keyword, to form described word set.
8. the method for search and webpage as claimed in claim 7 also comprises:
The appearance of any word in described page abstract in sign and the described word set.
CN 200810213931 2008-08-28 2008-08-28 Search engine, client thereof and method for searching page Expired - Fee Related CN101661490B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200810213931 CN101661490B (en) 2008-08-28 2008-08-28 Search engine, client thereof and method for searching page

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200810213931 CN101661490B (en) 2008-08-28 2008-08-28 Search engine, client thereof and method for searching page

Publications (2)

Publication Number Publication Date
CN101661490A CN101661490A (en) 2010-03-03
CN101661490B true CN101661490B (en) 2013-01-02

Family

ID=41789516

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200810213931 Expired - Fee Related CN101661490B (en) 2008-08-28 2008-08-28 Search engine, client thereof and method for searching page

Country Status (1)

Country Link
CN (1) CN101661490B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014132265A2 (en) * 2013-02-14 2014-09-04 Gyan Prakash Kesarwani An improved system and method of scanning a search engine depending on the importance of the keywords and producing an effective output
CN103473358B (en) * 2013-09-26 2018-10-09 北京奇虎科技有限公司 A kind of method and device of search engine collecting open type summary information of webpage
CN103514278B (en) * 2013-09-26 2016-11-23 北京奇虎科技有限公司 A kind of method and device verifying open type summary information of webpage
CN106462588B (en) * 2015-01-14 2020-04-10 微软技术许可有限责任公司 Content creation from extracted content
US10140017B2 (en) * 2016-04-20 2018-11-27 Google Llc Graphical keyboard application with integrated search
CN106096010B (en) * 2016-06-23 2020-07-28 北京奇元科技有限公司 Input control method and device with search engine function
CN108765262A (en) * 2018-05-17 2018-11-06 深圳航天智慧城市系统技术研究院有限公司 A method of showing true meteorological condition in arbitrary three-dimensional scenic
CN109271580B (en) * 2018-11-21 2022-04-01 百度在线网络技术(北京)有限公司 Search method, device, client and search engine

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101082917A (en) * 2006-06-02 2007-12-05 千橡世纪科技发展(北京)有限公司 Method and apparatus for rapid previewing summary of web page content
CN101097578A (en) * 2007-06-07 2008-01-02 北京金山软件有限公司 Network resource searching method and system
CN101127043A (en) * 2007-08-03 2008-02-20 哈尔滨工程大学 Lightweight individualized search engine and its searching method
CN101216837A (en) * 2008-01-18 2008-07-09 索意互动(北京)信息技术有限公司 Method and system for displaying search result based on matching user personalized configuration

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101082917A (en) * 2006-06-02 2007-12-05 千橡世纪科技发展(北京)有限公司 Method and apparatus for rapid previewing summary of web page content
CN101097578A (en) * 2007-06-07 2008-01-02 北京金山软件有限公司 Network resource searching method and system
CN101127043A (en) * 2007-08-03 2008-02-20 哈尔滨工程大学 Lightweight individualized search engine and its searching method
CN101216837A (en) * 2008-01-18 2008-07-09 索意互动(北京)信息技术有限公司 Method and system for displaying search result based on matching user personalized configuration

Also Published As

Publication number Publication date
CN101661490A (en) 2010-03-03

Similar Documents

Publication Publication Date Title
CN101661490B (en) Search engine, client thereof and method for searching page
US7895235B2 (en) Extracting semantic relations from query logs
US8612435B2 (en) Activity based users' interests modeling for determining content relevance
US6594654B1 (en) Systems and methods for continuously accumulating research information via a computer network
US9348872B2 (en) Method and system for assessing relevant properties of work contexts for use by information services
Jansen et al. Determining the informational, navigational, and transactional intent of Web queries
US9262532B2 (en) Ranking entity facets using user-click feedback
US8341167B1 (en) Context based interactive search
CN100432921C (en) Method and system for blending search engine results from disparate sources into one search result
US6883001B2 (en) Document information search apparatus and method and recording medium storing document information search program therein
US8650172B2 (en) Searchable web site discovery and recommendation
KR100645608B1 (en) Server of providing information search service using visited uniform resource locator log, and method thereof
CA2533605A1 (en) Providing a user interface with search query broadening
CN102073725A (en) Method for searching structured data and search engine system for implementing same
US20100011025A1 (en) Transfer learning methods and apparatuses for establishing additive models for related-task ranking
CN101909018A (en) Method and system for returning to instant messaging group according to webpage browsed by user
US20170075899A1 (en) Utilizing keystroke logging to determine items for presentation
CN103116635A (en) Field-oriented method and system for collecting invisible web resources
WO2016162843A1 (en) Processing a search query and retrieving targeted records from a networked database system
WO2001055909A1 (en) System and method for bookmark management and analysis
KR100869545B1 (en) Repetition search system with search history
CN102915312A (en) Method and system for issuing information on websites
CN105975508A (en) Personalized meta-search engine searched result merging and sorting method
O'Leary Internet-based information and retrieval systems
JP4934154B2 (en) Content providing device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20130102

Termination date: 20160828