CN101661490A - Search engine, client thereof and method for searching page - Google Patents

Search engine, client thereof and method for searching page Download PDF

Info

Publication number
CN101661490A
CN101661490A CN200810213931A CN200810213931A CN101661490A CN 101661490 A CN101661490 A CN 101661490A CN 200810213931 A CN200810213931 A CN 200810213931A CN 200810213931 A CN200810213931 A CN 200810213931A CN 101661490 A CN101661490 A CN 101661490A
Authority
CN
China
Prior art keywords
page
abstract
user
word set
webpage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN200810213931A
Other languages
Chinese (zh)
Other versions
CN101661490B (en
Inventor
张小洵
郭志立
郭宏蕾
祝慧佳
苏中
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to CN 200810213931 priority Critical patent/CN101661490B/en
Publication of CN101661490A publication Critical patent/CN101661490A/en
Application granted granted Critical
Publication of CN101661490B publication Critical patent/CN101661490B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to a search engine, a client thereof and a method for searching pages. The search engine comprises an inquiry device, a page abstract extraction device and a page abstract selection device, wherein the inquiry device is configured to search page sequences which meet an inquiry; the page abstract extraction device is configured to extract candidate page abstracts of at least one page in the page sequences; and the page abstract selection device is configured to select a candidate page abstract in the select candidate page abstracts according to a word set which is relatedto a user who sends the inquiry and the candidate page abstract is used as a page abstract supplied to the user. The invention selects the final page abstract from the candidate page abstracts according to individualizing key words which reflect the information demand of the use but not simply selects text segments containing the inquiring key words to form the page abstract, thereby meeting the individualizing information demand of the use to a certain degree.

Description

The method of search engine, its client and search and webpage
Technical field
The present invention relates to search engine technique, relate in particular to the retrieval of page abstract (snippet) relevant in the Query Result of search engine with webpage.
Background technology
Along with the continuous development of Internet service, for example the search engine of Google, Yahoo, MSN almost becomes the indispensable instrument of interested Internet resources (for example webpage) that it is found that.Search engine is worked usually in the following manner: in case the user has submitted inquiry to by client, search engine will return the webpage that searches by search results pages to the user.The webpage that searches is relevant with inquiry.Except the title and unified resource identifier (URL) of webpage, search results pages also comprises the short text relevant with webpage and describes.
This short text is described and is commonly called page abstract.Search engine extracts page abstract by extracting and make up the text chunk that comprises the related keyword of inquiry usually from webpage.In search results pages, search engine can by such as highlighted demonstration, underline, the various means of different fonts or the like make the demonstration of the searching keyword in the page abstract be different from other text, with the notice that attracts the user and be beneficial to user's decision and whether click this webpage.
Except searching keyword, the user may be in information requirement, and for example aspects such as personal interest, search intention and target there are differences.Though page abstract can reflect the relevant of webpage and inquiry to a certain extent, because present page abstract is made of the text chunk that comprises searching keyword, the content except that keyword in the text chunk is not considered in the selection of text chunk.
Therefore, require further improvement search technique, to satisfy the different information requirements of different people at least to a certain extent.
Summary of the invention
An object of the present invention is to provide the client of a kind of search engine, search engine and the method for search and webpage, thereby the page abstract of personalization is provided for the user.
In one embodiment of the invention, search engine comprises: inquiry unit is configured to retrieve the page sequence that satisfies inquiry; The page abstract extraction element is configured to extract the candidate page summary of the webpage of at least one in the described page sequence; The page abstract selecting arrangement is configured to according to selecting the candidate page in the described candidate page summary to make a summary with the user-dependent word set of sending described inquiry, as the page abstract that offers described user; With generation device as a result, be configured to produce the Query Result that comprises described page sequence and described page abstract.
In an optional embodiment, search engine can comprise the word set generation device, is configured to the information of information requirement according to the described user of reflection, obtains described user's at least one personalized keyword, to form described word set.
In one embodiment of the invention, the client of search engine comprises: receiving trap is configured to receive from search engine the candidate page summary of the webpage of at least one in the page sequence retrieve and the described page sequence; The page abstract selecting arrangement, be configured to according to selecting the candidate page in the described candidate page summary to make a summary with the user-dependent word set of described client, as the page abstract that offers described user, with generation device as a result, be configured to produce the Query Result that comprises described page sequence and described page abstract.
In an optional embodiment, client can comprise the word set generation device, is configured to the information of information requirement according to the described user of reflection, obtains described user's at least one personalized keyword, to form described word set.
In one embodiment of the invention, the method for search and webpage comprises: retrieve the page sequence that satisfies inquiry; Extract the candidate page summary of the webpage of at least one in the described page sequence; According to selecting the candidate page in the described candidate page summary to make a summary, as the page abstract that offers described user with the user-dependent word set of sending described inquiry; Comprise the Query Result of described page sequence and described page abstract with generation.
In one embodiment of the invention, the method for search and webpage comprises: the candidate page summary of the webpage of at least one in page sequence that retrieves from search engine reception response inquiry and the described page sequence; According to selecting the candidate page in the described candidate page summary to make a summary, as the page abstract that offers described user with the user-dependent word set of sending described inquiry; Comprise the Query Result of described page sequence and described page abstract with generation.
In an embodiment of the present invention, owing to be not to select simply and make up the text chunk that comprises searching keyword to form page abstract, but come from the candidate page summary, to select final page abstract, thereby can satisfy user's customized information demand to a certain extent according to the personalized keyword of the information requirement that reflects the user.
Description of drawings
With reference to below in conjunction with the explanation of accompanying drawing, can understand above and other purpose of the present invention, characteristics and advantage more easily to the embodiment of the invention.In the accompanying drawings, technical characterictic or parts identical or correspondence will adopt identical or corresponding Reference numeral to represent.
Fig. 1 is the block diagram that the general structure of search engine is shown.
Fig. 2 is the block diagram of general structure that the client of search engine is shown.
Fig. 3 illustrates the block diagram of the structure of search engine according to an embodiment of the invention.
Fig. 4 is the exemplary process diagram that the method for the search and webpage of carrying out in the search engine shown in Figure 3 is shown.
Fig. 5 A and 5B show an example of the inquiry of method processing shown in Figure 4.
Fig. 6 is the block diagram that illustrates based on the structure of the search engine of another embodiment of the present invention.
Fig. 7 is the block diagram that illustrates based on the structure of the client of another embodiment of the present invention.
Fig. 8 is the exemplary process diagram that the method for the search and webpage of carrying out in the client shown in Figure 7 is shown.
Fig. 9 is the block diagram that the exemplary configurations that wherein realizes computing machine of the present invention is shown.
Embodiment
Embodiments of the invention are described with reference to the accompanying drawings.Should be noted that in order to know purpose, omitted the parts that have nothing to do with the present invention, those of ordinary skills are known and the expression and the description of processing in accompanying drawing and the explanation.
Before describing embodiments of the invention, the general structure of search engine and client is described and will help to understand the present invention.
Fig. 1 is the block diagram that schematically shows the general structure of search engine.As shown in Figure 1, search engine generally comprises information search device 101, database 102, indexing unit 103 and requestor 104.Information search device 101 is responsible for roaming in the internet, finds and the collection info web, and info web is stored in the database 102.Indexing unit 103 is responsible for understanding the info web that information search device 101 was collected, and web page contents is analyzed, and web page contents is carried out marked index and stores in the database 102.The inquiry that requestor 104 receives from the user retrieves the webpage that satisfies inquiry from database 102, and Query Result is returned to the user.
Requestor 104 generally comprises inquiry unit 110, page abstract extraction element 111 and search-engine results generation device 112.Inquiry unit 110 sorts to result for retrieval according to the query and search database of user's submission and according to the degree of correlation, with webpage (being the address of the webpage) sequence that is met inquiry.At each webpage in the page sequence, page abstract extraction element 111 extracts a page abstract from the corresponding web page content of database 102.Search-engine results generation device 112 is organized into Query Result with page sequence with relevant contents such as page abstract, for example produces the HTML(Hypertext Markup Language) page, to feed back to inquiring user.
Page abstract extraction element 111 is at each keyword (except the keyword of getting rid of in the inquiry) in the inquiry, from web page contents, extract with respect to the text in the nearby sphere of the appearance position of this keyword, and will be combined into page abstract at the group of text of each keyword extraction.Occur if keyword has in web page contents repeatedly, page abstract extraction element 111 can use heuristic rule, summary technology or stochastic sampling to select text chunk to form page abstract simply.Heuristic rule is meant rule of thumb or infers the rule of the selection text chunk of determining.For example, based on the text chunk that comprises keyword that occurs first in the web page contents often more importantly experience or hypothesis, heuristic rule can be to select the text chunk that comprises keyword that occurs first in the web page contents.The summary technology is the technology of the main contents that a kind of extraction can document.Utilizing this technology to extract text chunk is exactly the main contents of selecting can reflect web page contents, promptly comprises the most important text chunk of keyword.Known have various technology to determine the importance of text chunk with respect to web page contents.For example can obtain the weight of each speech of text chunk by word frequency-reverse document (TF-IDF) method (back can specifically describe) with respect to web page contents, in the text chunk weight of all speech and then be the weight of text section.The weight of text chunk is big more, and then importance is just high more.
Fig. 2 is the block diagram of general structure that schematically shows the client 200 of search engine.As shown in Figure 2, client 200 comprises user interface 201, client dispensing device 202 and client receiving trap 203.User interface 201 receives the inquiry of user's input, presents Query Result to the user, and the browse operation of process user (for example page turning, rolling or the like).The inquiry that client dispensing device 202 will be input to user interface 201 sends to search engine.Client receiving trap 203 receives Query Result to present to the user by user interface 201 from search engine.
Fig. 3 illustrates the block diagram of the structure of search engine according to an embodiment of the invention.As shown in Figure 3, search engine comprises information search device 301, database 302, indexing unit 303 and requestor 304.Requestor 304 comprises inquiry unit 310, page abstract extraction element 311, search-engine results generation device 312 and search engine page abstract selecting arrangement 313.Information search device 301, database 302, indexing unit 303 and inquiry unit 310 can be identical with inquiry unit 110 with information search device 101, database 102, indexing unit 103 shown in Figure 1, therefore no longer repeat specification here.
Different with page abstract extraction element 111 shown in Figure 1, at each webpage in the page sequence, page abstract extraction element 311 is not to extract a page abstract from the corresponding web page content of database 302, but extracts all candidates' page abstract.Page abstract extraction element 311 extracts the contiguous text of keyword and makes up the method for being close to text from web page contents can be identical with the page abstract extraction element of prior art.For example, the institute that can find out the keyword (except the keyword of getting rid of in the inquiry) that relates in the inquiry in web page contents occurs.Occur at each, can from web page contents, extract the text chunk that comprises this appearance according to preset extraction rule from the position of this appearance.Preset extraction rule for example can comprise:
● length constraint: the length of text chunk is no more than the predetermined upper limit;
● integrity constraint: text chunk remains complete sentence as far as possible;
● the total length constraint: the extraction of text chunk will consider that the length after the combination is no more than the predetermined upper limit;
● avoid repetition:, get the same text section for the different text chunks of taking from a part that comprise at least two different keywords as far as possible.For example, for sentence " The D300 is designatedby Nikon as the ultimate in DX format performance ", if extract text chunk at keyword " D300 " and " format ", may extract text chunk " The D300is designated by Nikon as the ultimate " and " by Nikon as the ultimatein DX format performance ".And, then for example should extract " The D300is designated by Nikon as the ultimate in DX formatperformance " according to avoiding the repetition principle.If limited length, available suspension points replaces irrelevant part.
● Else Rule and these regular combination in any.
Page abstract extraction element 311 obtains all combinations of these text chunks according to the predetermined combinations rule.The predetermined combinations rule can comprise: the kind of the keyword that comprises in the combination is many more, then can more preferably consider this combination; Avoid comprising repetition or overlapping text chunk; Or the like.Alternatively, page abstract extraction element 311 can be had to the combination of part according to predetermined policy.For example obtain the more combination of species number of the keyword comprise, obtain the combination of predetermined ratio in all combinations, obtain the combination of part at random, or the like.Page abstract extraction element 311 is not only to select a combination as page abstract from the combination that obtains, but exports the combination of all acquisitions, to make a summary as candidate page.
Alternatively, page abstract extraction element 311 needn't extract the candidate page summary at each webpage in the page sequence, but can extract the candidate page summary at the part webpage of selecting according to predetermined policy.For example, can at the webpage of the webpage of front, predetermined ratio, webpage that the degree of correlation surpasses predetermined threshold or or even the webpage determined at random extract.
Search engine page abstract selecting arrangement 313 bases and the user-dependent word set of sending inquiry from the candidate page summary of page abstract extraction element 311 outputs, are selected as the page abstract that offers the user.Speech in the word set has reflected user's customized information demand, thereby can determine whether whether the content of candidate page summary partly or entirely satisfies user's customized information demand, promptly comprise the speech in the word set according to the speech in the word set.Can determine the degree of candidate page summary reflection customized information demand according to the whole bag of tricks.For example, can be with the species number of the speech in the word set that occurs in the candidate page summary tolerance as degree.In this case, the species number of the speech in the word set that a certain candidate page summary is comprised is big more, and then personalized degree is high more.In another example, can preestablish weight for the speech in the word set, be that speech in the word set assigns weight perhaps according to predetermined criterion.As an example that assigns weight according to predetermined criterion, information according to the information requirement that reflects the user obtains user's personalized keyword to form word set (detailed description sees below), can obtain the frequency of occurrences of speech in described information in the word set, and distribute the weight that is complementary with its frequency of occurrences for the speech in the word set.In this case, weight and big more, then personalized degree is high more.Can select to reflect the candidate page summary that personalized degree is the highest, or select in the higher some candidate page summaries of the personalized degree of reflection one at random, or select the personalized degree of reflection to be higher than first candidate page summary of predetermined threshold, or select the personalized degree of reflection higher and determine more can reflect the candidate page summary of web page contents according to the summary technology, or the like.Those of ordinary skill in the art understands, and the present invention can also adopt alternate manner to select the page abstract in the candidate page summary.
For the species number of the speech in the word set that obtains to occur in the candidate page summary, one skilled in the art can realize by the whole bag of tricks.For example, can scan the speech that each candidate page summary comprised and whether be comprised in the word set, thereby count species number, and then relatively at the species number of all candidate page summary statistics.Perhaps, can adopt following more formal method.
In the method, at first set up a dictionary, comprise a plurality of speech of arranging by predefined procedure.This dictionary can be the complete or collected works of all possible speech, also can be such complete or collected works' subclass.In the latter case, word set will not comprise non-existent speech in the dictionary.Above-mentioned word set is configured to a word set vector.Each element in this word set vector all with dictionary in a different speech corresponding, and with for example 1 representing that word set comprises corresponding speech, with for example 0 representing that word set does not comprise corresponding speech.Constitute corresponding candidate page summary vector by similar mode for each.Each element in the candidate page summary vector all with dictionary in a different speech corresponding, and with for example 1 representing that candidate page is made a summary and comprise corresponding speech, with for example 0 representing that candidate page is made a summary and do not comprise corresponding speech.The number of element is identical with the number of speech in the dictionary in the word set vector sum candidate page summary vector.
Calculate the correlativity between the word set vector sum candidate page summary vector.For example can use cosine distance c osine or overlap distance overlap between vector to calculate the degree of correlation, formula is defined as follows respectively:
Cosine:
Figure A20081021393100101
Overlap:
Figure A20081021393100102
Here, x and y represent word set vector sum candidate page summary vector respectively, | x| and | y| represents the nonzero element number that comprises among vector x and the y, d respectively Ot(x, y) inner product of expression vector x and y.The degree of correlation is high more, shows that then the species number of the speech in the word set that occurs in the candidate page summary is many more.Thereby can carry out above-mentioned selection by the degree of correlation relatively.
Above-mentioned word set can include the customized information demand that reflects the user, for example keyword of aspects such as personal interest, search intention and target.Word set can be artificial the setting, for example selects to set by importing file or interactive mode, also can produce (back can be described in detail) by automation equipment.Search engine can be by the whole bag of tricks identification and user-dependent word set.For example, need be under the situation of registration on the search engine and login the user, can when the user registers, import, receive or produce relevant word set from client, and when the user logins, determine corresponding word set, even can after login, make amendment, import or produce the operation of word set by the user according to user identity.Perhaps, can collect user's identity information (for example by COOKIE, call ACTIVEX control, download and operation APPLET etc.) from user client, and under the situation of clear and definite user identity, produce word set, importing or receive the word set that has produced from client.Perhaps, can the plug-in unit that be used for importing, transmitting to search engine word set or generation word set be installed in client.
Search-engine results generation device 312 produces the Query Result that comprises page sequence and selected candidate page summary.For example, Query Result can be organized into the tabulation of the unit as a result that the candidate page summary by the information (for example title, address or the like) of the webpage in the page sequence and corresponding selection constitutes.
Alternatively, perhaps under the situation that page abstract extraction element 311 extracts at the part webpage of page sequence, page abstract extraction element 311 can further produce the page abstract of prior art, and is replaced the page abstract of corresponding prior art by the selected candidate page summary of search-engine results generation device 312 usefulness.
Describe the method for the search and webpage of carrying out in the search engine shown in Fig. 3 below in conjunction with Fig. 4, wherein omitted the processing that has nothing to do with the present invention.
As shown in Figure 4, method is from step 400.In step 402, in response to the inquiry that receives the user, inquiry unit 310 retrieves the page sequence that satisfies inquiry.Then in step 404, page abstract extraction element 311 extracts the candidate page summary of the webpage of at least one in the page sequence.Then in step 406, search engine page abstract selecting arrangement 313 is at above-mentioned at least one webpage, with selecting of being extracted contain in the page abstract with user-dependent word set in the maximum candidate page summary of speech, be chosen as the page abstract that offers the user.Then in step 408, search-engine results generation device 312 produces the Query Result that comprises described page sequence and described page abstract.Method then finishes in step 410.
Fig. 5 A and 5B show an example of the inquiry of method processing shown in Figure 4.Fig. 5 A shows the inquiry of importing at the user in step 402 " Nikon D300 ", and (" Nikon " is a camera brand, " D300 " is a camera model) content of a webpage in the page sequence that searches, wherein inquiry comprises two searching keywords " Nikon " and " D300 ".Suppose that word set comprises word set keyword " format " (camera lens standard) and " battery " (battery).In the content shown in Fig. 5 A, paragraph 501 comprises searching keyword " Nikon " and " D300 ".Paragraph 502 comprises searching keyword " D300 " and word set keyword " format ".Paragraph 503 does not comprise searching keyword " Nikon " and " D300 ", but comprises word set keyword " format ".Paragraph 504 comprises searching keyword " D300 " and word set keyword " battery ".Paragraph 505 does not comprise searching keyword " Nikon " and " D300 ", but comprises word set keyword " battery ".In step 404, extract text chunk " The Nikon D300is a 12.3-megapixel professional digital single-lens-reflex (dSLR) camerathat Nikon Corporation announced on 23 August 2007 ", " The D300 isdesignated by Nikon as the ultimate in DX format performance " and " TheMB-D10 allows the D300 to be powered by an additional EN-EL3e batteryor AA batteries ".In step 406, the combination that is obtained " The Nikon D300 is a12.3-megapixel professional digital single-lens-reflex (dSLR) camera thatNikon Corporation announced on 23 August 2007.The D300 is designatedby Nikon as the ultimate in DX format performance.The MB-D10 allowsthe D300 to be powered by an additional EN-EL3e battery or AAbatteries " comprises word set keyword " format " and " battery ", the word set keyword that promptly comprises is maximum, so selected.In step 408, produce the accordingly result unit shown in Fig. 5 B.
Get back to Fig. 3, preferably, search engine can comprise word set generation device (not shown).The word set generation device is according to the information of the described user's of reflection information requirement, and the personalized keyword of at least one of acquisition user is to form word set.The information that reflects described user's information requirement is meant and can reflects that user for example once, now or will browse, want in the future the information of browsed content, the webpage of for example inquiry of carrying out, the webpage of browsing, collection or document or the like.From these information, can collect user's customized information demand.Can carry out such collection by the known technology in personalized retrieval field, people's such as Shengliang Xu " Exploring Folksonomy for PersonalizedSearch " for example, SIGIR ' 08, July 20-24,2008, disclosed technology among the Singapore, word frequency-reverse document (TF-IDF) or the like.Can collect such information from search engine or client (for example by COOKIE, call ACTIVEX control, download and operation APPLET etc.).As a concrete example, the word set generation device can be by the information acquisition word set of following TF-IDF method according to reflection user personalized information demand.
By the TF-IDF method, can assess the significance level of a speech for certain document in the information (for example document sets) of reflection user personalized information demand, its formula is as follows:
tf i , j = n o . k Σ k n k , j
Molecule n wherein I, jExpression speech t iAt document d jThe middle number of times that occurs, and denominator ∑ kn K, jExpression document d jIn the number of times sum that occurs of all speech,
idf i = log | D | | { d : t i ∈ d } |
Wherein | D| represents the total number of documents in the document sets, and | { d:t i∈ d}| represents to comprise speech t iThe document number.
Last speech t iWith respect to document d jWeight w I, jFor
w i,j=tf i,j·idf i
In keyword abstraction, with weight w in one piece of document I, jThose bigger speech are as the keyword of document.All keywords that obtained form word set.Under the situation of using dictionary as previously mentioned, word set will not comprise non-existent speech in the dictionary.
Preferably, search engine can comprise the identity device (not shown).Identity device can identify the appearance in selected candidate page summary of any speech in the word set, for example by highlighted demonstration, underline, different fonts or the like means.Client can present accordingly according to such sign.
In the above-described embodiments, can bring in by client shown in Figure 2 and finish inquiry.
Though among the embodiment in front generation device as a result is described as being included in the search engine, yet generation device also can be contained in the client as a result.Under these circumstances, search engine can return the response inquiry and page sequence that retrieves and the page abstract of selecting at the webpage in the page sequence to client.In client, receive the page abstract of page sequence and selection, and produce Query Result and offer user interface by generation device as a result and present by the client receiving trap.
Alternatively, directly import or produce word set in client, and by the client dispensing device on suitable opportunity, for example user registration, sign in to the control of search engine, response search engine, when transmission inquires search engine, under response user's control or the like the situation, the word set of this locality is sent to search engine.
Preferably, the word set generation device can be contained in the client, rather than in search engine.Under these circumstances, can be on suitable opportunity, for example under response user's control, situation that the user starts client or the like, start the word set generation device and come to form word set according to the information of the reflection user's of this locality information requirement.
Correspondingly, have in client under the situation of word set, identity device can be contained in the client, rather than in search engine.Under these circumstances, before Query Result was provided for user interface, then identity device identified.
Fig. 6 is the block diagram that illustrates based on the search engine of another embodiment of the present invention.Fig. 7 is the block diagram that illustrates based on the client of another embodiment of the present invention.
As shown in Figure 6, search engine comprises information search device 601, database 602, indexing unit 603 and requestor 604.Requestor 604 comprises inquiry unit 610, page abstract extraction element 611 and search engine dispensing device 614.Information search device 601, database 602, indexing unit 603, inquiry unit 610 and page abstract extraction element 611 can be identical with page abstract extraction element 311 with information search device 301, database 302, indexing unit 303, inquiry unit 310 shown in Figure 3, therefore no longer repeat specification here.Search engine dispensing device 614 sends page sequence and the selected candidate page summary that is extracted to the user client of sending inquiry.
As shown in Figure 7, client 700 comprises user interface 701, client dispensing device 702, client receiving trap 703, customer terminal webpage summary selecting arrangement 704 and client generation device 705 as a result.User interface 201, the client dispensing device 202 with shown in Figure 2 is identical respectively with client dispensing device 702 for user interface 701, no longer is repeated in this description here.In Fig. 7 and since client dispensing device 703 and user interface 701 with solve the present invention institute at technical matters have nothing to do, so the with dashed lines frame is represented.Client receiving trap 703 receives the page sequence retrieve and the candidate page summary of the webpage in the page sequence from search engine.Similar with search engine page abstract selecting arrangement 313, customer terminal webpage summary selecting arrangement 704 is chosen as the page abstract that offers the user with the maximum candidate page summary of speech in the user-dependent word set that contains in the candidate page summary with client.Similar with search-engine results generation device 312, client generation device 705 as a result produces the Query Result that comprises page sequence and chosen page summary, but offers user interface 701.
Describe the method for the search and webpage of carrying out in the client shown in Fig. 7 below in conjunction with Fig. 8, wherein omitted the processing that has nothing to do with the present invention.
As shown in Figure 8, method is from step 800.In step 802, the candidate page summary of the webpage of at least one in page sequence that client receiving trap 703 retrieves from search engine reception response inquiry shown in Figure 6 and the page sequence.Then in step 804, customer terminal webpage summary selecting arrangement 704 is the user of containing in the candidate page summary with client, promptly sends the maximum candidate page summary of speech in the user-dependent word set of inquiry, as the page abstract that offers the user.Then in step 806, client generation device 705 as a result produces the Query Result that comprises described page sequence and described page abstract.Method then finishes in step 808.
Preferably, client 700 shown in Figure 7 can comprise the identity device (not shown) identical with previously described identity device.
Can be by directly importing or produce word set in client with previously described similar mode.Preferably, client 700 can comprise the word set generation device identical with previously described word set generation device.Under these circumstances, can be on suitable opportunity, for example under response user's control, situation that the user starts client or the like, start the word set generation device and come to form word set according to the information of the reflection user's of this locality information requirement.
Though should be noted that among the embodiment of front search engine and client are described as having server and client side's architecture, yet the present invention is not limited to this architecture.For example, search engine can be implemented in main frame, and the user visits the query function that main frame provides by terminal.Perhaps, search engine and client functionality can be integrated.
In addition, should also be noted that above-mentioned series of processes and device also can pass through hardware, software and firmware and realize.Under situation about realizing by software or firmware, from storage medium or network to computing machine with specialized hardware structure, multi-purpose computer 800 for example shown in Figure 8 is installed the program that constitutes this software, and this computing machine can be carried out various functions or the like when various program is installed.
The realization environment of system of the present invention, unit and method as shown in Figure 8.
In Fig. 9, CPU (central processing unit) (CPU) 901 is carried out various processing according to program stored among read-only mapping (enum) data (ROM) 902 or from the program that storage area 908 is loaded into random access mapping (enum) data (RAM) 903.In RAM 903, also store data required when CPU 901 carries out various processing or the like as required.
CPU 901, ROM 902 and RAM 903 are connected to each other via bus 904.Input/output interface 905 also is connected to bus 904.
Following parts are connected to input/output interface 905: importation 906 comprises keyboard, mouse or the like; Output 907 comprises display, such as cathode ray tube (CRT), LCD (LCD) or the like and loudspeaker or the like; Storage area 908 comprises hard disk or the like; With communications portion 909, comprise that network interface unit is such as LAN card, modulator-demodular unit or the like.Communications portion 909 is handled such as the Internet executive communication via network.
As required, driver 910 also is connected to input/output interface 905.Detachable media 911 is installed on the driver 910 as required such as disk, CD, magneto-optic disk, semiconductor mapping (enum) data or the like, makes the computer program of therefrom reading be installed to as required in the storage area 908.Realizing by software under the situation of above-mentioned steps and processing, such as detachable media 911 program that constitutes software is being installed such as the Internet or storage medium from network.
It will be understood by those of skill in the art that this storage medium is not limited to shown in Figure 9 wherein having program stored therein, distribute separately so that the detachable media 911 of program to be provided to the user with equipment.The example of detachable media 911 comprises disk, CD (comprising read-only mapping (enum) data of CD (CD-ROM) and digital universal disc (DVD)), magneto-optic disk and (comprises mini-disk (MD) and semiconductor mapping (enum) data.Perhaps, storage medium can be hard disk that comprises in ROM 902, the storage area 908 or the like, computer program stored wherein, and be distributed to the user with the equipment that comprises them.
With reference to specific embodiment the present invention has been described in the instructions in front.Yet those of ordinary skill in the art understands, and can carry out various modifications and change under the prerequisite that does not depart from the scope of the present invention that limits as claims.

Claims (12)

1. search engine comprises:
Inquiry unit is configured to retrieve the page sequence that satisfies inquiry;
The page abstract extraction element is configured to extract the candidate page summary of the webpage of at least one in the described page sequence;
The page abstract selecting arrangement is configured to according to selecting page abstract in the described candidate page summary with the user-dependent word set of sending described inquiry, as the page abstract that offers described user; With
Generation device is configured to produce the Query Result that comprises described page sequence and described page abstract as a result.
2. search engine as claimed in claim 1 also comprises:
The word set generation device is configured to the information of information requirement according to the described user of reflection, obtains described user's at least one personalized keyword, to form described word set.
3. search engine as claimed in claim 1 also comprises:
Identity device is configured to identify the appearance of any speech in described page abstract in the described word set.
4. the client of a search engine comprises:
Receiving trap is configured to receive from search engine the candidate page summary of the webpage of at least one in the page sequence retrieve and the described page sequence;
The page abstract selecting arrangement is configured to according to selecting page abstract in the described candidate page summary with the user-dependent word set of described client, as the page abstract that offers described user; With
Generation device is configured to produce the Query Result that comprises described page sequence and described page abstract as a result.
5. client as claimed in claim 4 also comprises:
The word set generation device is configured to the information of information requirement according to the described user of reflection, obtains described user's at least one personalized keyword, to form described word set.
6. client as claimed in claim 4 also comprises:
Identity device, be configured to identify with described word set in the appearance of any speech in described page abstract.
7. the method for a search and webpage comprises:
Retrieve the page sequence that satisfies inquiry;
Extract the candidate page summary of the webpage of at least one in the described page sequence;
According to selecting page abstract in the described candidate page summary, as the page abstract that offers described user with the user-dependent word set of sending described inquiry; With
Generation comprises the Query Result of described page sequence and described page abstract.
8. the method for search and webpage as claimed in claim 7 also comprises:
According to the described user's of reflection the information of information requirement, obtain described user's at least one personalized keyword, to form described word set.
9. the method for search and webpage as claimed in claim 7 also comprises:
The appearance of any speech in described page abstract in sign and the described word set.
10. the method for a search and webpage comprises:
The candidate page summary of the webpage of at least one in page sequence that retrieves from search engine reception response inquiry and the described page sequence;
According to selecting page abstract in the described candidate page summary, as the page abstract that offers described user with the user-dependent word set of sending described inquiry; With
Generation comprises the Query Result of described page sequence and described page abstract.
11. the method for search and webpage as claimed in claim 10 also comprises:
According to the described user's of reflection the information of information requirement, obtain described user's at least one personalized keyword, to form described word set.
12. the method for search and webpage as claimed in claim 10 also comprises:
The appearance of any speech in described page abstract in sign and the described word set.
CN 200810213931 2008-08-28 2008-08-28 Search engine, client thereof and method for searching page Expired - Fee Related CN101661490B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200810213931 CN101661490B (en) 2008-08-28 2008-08-28 Search engine, client thereof and method for searching page

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200810213931 CN101661490B (en) 2008-08-28 2008-08-28 Search engine, client thereof and method for searching page

Publications (2)

Publication Number Publication Date
CN101661490A true CN101661490A (en) 2010-03-03
CN101661490B CN101661490B (en) 2013-01-02

Family

ID=41789516

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200810213931 Expired - Fee Related CN101661490B (en) 2008-08-28 2008-08-28 Search engine, client thereof and method for searching page

Country Status (1)

Country Link
CN (1) CN101661490B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103473358A (en) * 2013-09-26 2013-12-25 北京奇虎科技有限公司 Method and device for search engine to crawl webpage open summary information
CN103514278A (en) * 2013-09-26 2014-01-15 北京奇虎科技有限公司 Method and device for verifying open type summary information of webpage
WO2014132265A2 (en) * 2013-02-14 2014-09-04 Gyan Prakash Kesarwani An improved system and method of scanning a search engine depending on the importance of the keywords and producing an effective output
WO2016112503A1 (en) * 2015-01-14 2016-07-21 Microsoft Corporation Content creation from extracted content
CN106096010A (en) * 2016-06-23 2016-11-09 北京奇虎科技有限公司 Carry input control method and the device of search engine functionality
CN107305493A (en) * 2016-04-20 2017-10-31 谷歌公司 Graphic keyboard application with integration search
CN108765262A (en) * 2018-05-17 2018-11-06 深圳航天智慧城市系统技术研究院有限公司 A method of showing true meteorological condition in arbitrary three-dimensional scenic
CN109271580A (en) * 2018-11-21 2019-01-25 百度在线网络技术(北京)有限公司 Searching method, device, client and search engine

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101082917A (en) * 2006-06-02 2007-12-05 千橡世纪科技发展(北京)有限公司 Method and apparatus for rapid previewing summary of web page content
CN100476830C (en) * 2007-06-07 2009-04-08 北京金山软件有限公司 Network resource searching method and system
CN100541495C (en) * 2007-08-03 2009-09-16 哈尔滨工程大学 A kind of searching method of individual searching engine
CN101216837A (en) * 2008-01-18 2008-07-09 索意互动(北京)信息技术有限公司 Method and system for displaying search result based on matching user personalized configuration

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014132265A2 (en) * 2013-02-14 2014-09-04 Gyan Prakash Kesarwani An improved system and method of scanning a search engine depending on the importance of the keywords and producing an effective output
WO2014132265A3 (en) * 2013-02-14 2015-01-22 Gyan Prakash Kesarwani An improved system and method of scanning a search engine depending on the importance of the keywords and producing an effective output
CN103473358A (en) * 2013-09-26 2013-12-25 北京奇虎科技有限公司 Method and device for search engine to crawl webpage open summary information
CN103514278A (en) * 2013-09-26 2014-01-15 北京奇虎科技有限公司 Method and device for verifying open type summary information of webpage
CN106462588A (en) * 2015-01-14 2017-02-22 微软技术许可有限责任公司 Content creation from extracted content
WO2016112503A1 (en) * 2015-01-14 2016-07-21 Microsoft Corporation Content creation from extracted content
US10579630B2 (en) 2015-01-14 2020-03-03 Microsoft Technology Licensing, Llc Content creation from extracted content
CN107305493A (en) * 2016-04-20 2017-10-31 谷歌公司 Graphic keyboard application with integration search
CN106096010A (en) * 2016-06-23 2016-11-09 北京奇虎科技有限公司 Carry input control method and the device of search engine functionality
CN106096010B (en) * 2016-06-23 2020-07-28 北京奇元科技有限公司 Input control method and device with search engine function
CN108765262A (en) * 2018-05-17 2018-11-06 深圳航天智慧城市系统技术研究院有限公司 A method of showing true meteorological condition in arbitrary three-dimensional scenic
CN109271580A (en) * 2018-11-21 2019-01-25 百度在线网络技术(北京)有限公司 Searching method, device, client and search engine
CN109271580B (en) * 2018-11-21 2022-04-01 百度在线网络技术(北京)有限公司 Search method, device, client and search engine

Also Published As

Publication number Publication date
CN101661490B (en) 2013-01-02

Similar Documents

Publication Publication Date Title
US8612435B2 (en) Activity based users' interests modeling for determining content relevance
Jansen et al. Determining the informational, navigational, and transactional intent of Web queries
US9348872B2 (en) Method and system for assessing relevant properties of work contexts for use by information services
US6594654B1 (en) Systems and methods for continuously accumulating research information via a computer network
CN101661490B (en) Search engine, client thereof and method for searching page
US8650172B2 (en) Searchable web site discovery and recommendation
CN101520784B (en) Information issuing system and information issuing method
US9262532B2 (en) Ranking entity facets using user-click feedback
US8341167B1 (en) Context based interactive search
US7895235B2 (en) Extracting semantic relations from query logs
US20070022085A1 (en) Techniques for unsupervised web content discovery and automated query generation for crawling the hidden web
CN107066529B (en) Federated community search
CN1278263C (en) System for carrying out universal search management in one or more networks
CN102687138A (en) Search suggestion clustering and presentation
US9875306B2 (en) Navigation through a collection of electronic documents
US20100011025A1 (en) Transfer learning methods and apparatuses for establishing additive models for related-task ranking
WO2016162843A1 (en) Processing a search query and retrieving targeted records from a networked database system
WO2014093808A2 (en) Utilizing keystroke logging to determine items for presentation
KR100869545B1 (en) Repetition search system with search history
CN102915312A (en) Method and system for issuing information on websites
CN105975508A (en) Personalized meta-search engine searched result merging and sorting method
EP2414971A1 (en) Data searching system
O'Leary Internet-based information and retrieval systems
EP2662785A2 (en) A method and system for non-ephemeral search
KR101120040B1 (en) Apparatus for recommending related query and method thereof

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20130102

Termination date: 20160828

CF01 Termination of patent right due to non-payment of annual fee