CN102722499A - Search engine and implementation method thereof - Google Patents

Search engine and implementation method thereof Download PDF

Info

Publication number
CN102722499A
CN102722499A CN2011100796991A CN201110079699A CN102722499A CN 102722499 A CN102722499 A CN 102722499A CN 2011100796991 A CN2011100796991 A CN 2011100796991A CN 201110079699 A CN201110079699 A CN 201110079699A CN 102722499 A CN102722499 A CN 102722499A
Authority
CN
China
Prior art keywords
synonym
result
query
original
registration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2011100796991A
Other languages
Chinese (zh)
Other versions
CN102722499B (en
Inventor
呼大为
李彦宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201110079699.1A priority Critical patent/CN102722499B/en
Publication of CN102722499A publication Critical patent/CN102722499A/en
Application granted granted Critical
Publication of CN102722499B publication Critical patent/CN102722499B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a search engine and an implementation method thereof. The implementation method comprises the following steps: receiving an original inquiry of user's search; analyzing the original inquiry so as to obtain an original word and synonyms of the original word, which are stored in the original inquiry form, and then replacing the original word in the original inquiry form by the synonyms so as to obtain the synonym inquiry; obtaining an original inquiry result website set and a synonym inquiry result website set according to the original inquiry and the synonym inquiry; calculating the overlap ratio of the webpages respectively in the original inquiry result and the synonym inquiry result; and consolidating the result website sets of the original inquiry and the synonym inquiry according to a preset consolidating strategy and a consolidating strategy corresponding to the overlap ratio, so as to generate a search result list. The search engine judges the overlap rate of the original inquiry result and the synonym inquiry result to determine the probability of appearing figurative sense in the synonym inquiry result; and when the probability is larger, synonymous inquiry result is pressed so as to avoid that results not meeting the searching requirements appear in the forefront of the list of search results, and as a result, excellent user experience of the user is ensured.

Description

Search engine and its implementation
Technical field
The present invention relates to search engine technique, relate in particular to a kind of search engine and its implementation of expanding synonymic search query.
Background technology
Rapid development of Internet provides the carrier of brand-new information stores, processing, transmission and a use for people, and the network information also becomes people rapidly and obtains one of main channel of knowledge and information.And so how fully the information resources of scale have brought the problem of development and utilization also for the user of resource when nearly all knowledge that the mankind are occupied is included.Search engine arises at the historic moment under this demand just, and its assisted network user searches information on the internet.Particularly, search engine gathers information from the internet according to certain strategy, the specific computer program of utilization, and after information being organized and handled, for the user provides search service, the user is given in the information exhibition that user search is relevant.
The on-line search service that search engine provides is normally based on the search of keyword, and promptly the user is through the input frame input inquiry expression formula of search engine, and the results web page that comprises these keywords is inquired about and returned to search engine.Because the knowledge background of different user or use habit are different; To same things search for employed keyword maybe be also can be different; Add that itself just exists a lot of synonyms or near synonym in the natural language, so the keyword search that only provides based on the user is not enough.At present, a lot of search engines all have the function of expanding query, like the synonym expanding query.Receive the original query expression formula of user's input when search engine after, can carry out the participle operation, and whether have potential synonym right in the entry set behind the identification participle it.Particularly; Search engine can mate entry after the cutting and predetermined synonym dictionary; Judge whether exist in these entries synon; If, then can be on synon basis the expanded search inquiry, and return after the Query Result of expansion and original Query Result merged and be shown to the user.Thereby, the Search Results of more heterogeneous pass is provided for the user.
Yet same words possibly embody different implications in the different semantics environment, so its synonym also is synonym or a nearly justice in certain semantic environment, does the different semantics environment and change, and this synonym just can't be suitable for.So in this case, the possibility of result that obtains with the synonym expanding query is not the result that the user wants just, thus, can bring relatively poor experience to the user on the contrary.For example, the original query of user's input is " how fish-flavoured shredded pork is cooked ".Subsequently; Search engine is through the participle to original query; And with thesaurus coupling after obtained " how doing " potential synonym to { " how doing ", " menu " }, and replaced " how doing " with " menu " and carried out the expansion synonym and inquire about and obtain corresponding Query Result.If but the original query that the user provides is " how doing bedside cupboard "; Obviously; User's demand at this moment is to want to understand manufacturing furniture; And search engine still uses " menu " to replace " how doing " to expand synonym inquiry, just obtained user and undesired escape result, and so the user can query to the accuracy of search.
In view of this, be necessary existing search engine is improved, to address the above problem.
Summary of the invention
The object of the present invention is to provide a kind of search engine; It adjusts the ordering of synonym Query Result in whole Search Results through the escape probability of distinguishing synonym expanding query result; Thereby avoid the escape result occurring, and then guarantee that the user has good experience in the prostatitis of Search Results.
The present invention also aims to provide a kind of implementation method of above-mentioned search engine.
For realizing one of foregoing invention purpose, the implementation method of a kind of search engine of the present invention is characterized in that, this method comprises the steps:
Receive the original query of user search;
Analyze said original query, obtaining the former speech that is present in the original query and the synonym of this former speech, and said synonym substituted former speech in the original query to obtain the synonym inquiry;
According to said original query and synonym query search and obtain original query results web page set and the set of synonym queried result website;
Calculate the registration of webpage in said original query result and the synonym Query Result;
Merge the results web page set of original query and synonym inquiry according to the predetermined consolidation strategy corresponding, and generate search result list with said registration.
As further improvement of the present invention, the calculating of said registration comprises the quantity of calculating the webpage that overlaps in original query result and the synonym Query Result | U1 ∩ U2|.
As further improvement of the present invention, the calculating of said registration also comprise less in the webpage quantity of the webpage quantity of confirming a original query result and synonym Query Result Min (| U1|, | U2|); Said registration I (U1, U2)=| U1 ∩ U2|/Min (| U1|, | U2|).
As further improvement of the present invention, the calculating of said registration also comprises the summation of the webpage quantity of the webpage quantity of calculating the original query result and synonym Query Result | U1 ∪ U2|; Said registration I (U1, U2)=| U1 ∩ U2|/| U1 ∪ U2|.
As further improvement of the present invention; Said consolidation strategy comprises: when the value of said registration in predetermined registration interval during less than predetermined threshold value; Said predetermined consolidation strategy is for suppressing processing when merging the synonym Query Result, said suppress to handle comprise:
Reduce the degree of correlation weights of webpage in the synonym Query Result; Perhaps
The synonym Query Result is inserted into after the specific page of search result list; Perhaps
The synonym Query Result is adjusted to original query result's back.
As further improvement of the present invention; Said consolidation strategy comprises: when the value of registration in predetermined registration interval during greater than predetermined threshold value, according to the original result who inquires about with synonym of degree of correlation weight number combining of each webpage in original query result and the synonym Query Result.
For realizing above-mentioned another purpose, a kind of search engine of the present invention, it comprises search component, search component comprises:
The query analysis module is used to receive the original query of user search; Analyze said original query, obtaining the former speech that is present in the original query and the synonym of this former speech, and said synonym substituted former speech in the original query to obtain the synonym inquiry;
Search module is used for according to said original query and synonym query search and obtains the set of original query results web page gathering with the synonym queried result website;
Registration calculates and the result merges module, is used for calculating the registration of said original query result and synonym Query Result webpage; And according to the predetermined consolidation strategy merging original query corresponding and the results web page set of synonym inquiry, and generate search result list with said registration.
As further improvement of the present invention, the calculating of said registration comprises the quantity of calculating the webpage that overlaps in original query result and the synonym Query Result | U1 ∩ U2|.
As further improvement of the present invention, the calculating of said registration also comprise less in the webpage quantity of the webpage quantity of confirming a original query result and synonym Query Result Min (| U1|, | U2|); Said registration I (U1, U2)=| U1 ∩ U2|/Min (| U1|, | U2|).
As further improvement of the present invention, the calculating of said registration also comprises the summation of the webpage quantity of the webpage quantity of calculating the original query result and synonym Query Result | U1 ∪ U2|; Said registration I (U1, U2)=| U1 ∩ U2|/| U1 ∪ U2|.
As further improvement of the present invention; Said consolidation strategy comprises: when the value of said registration in predetermined registration interval during less than predetermined threshold value; Said predetermined consolidation strategy is for suppressing processing when merging the synonym Query Result, said suppress to handle comprise:
Reduce the degree of correlation weights of webpage in the synonym Query Result; Perhaps
The synonym Query Result is inserted into after the specific page of search result list; Perhaps
The synonym Query Result is adjusted to original query result's back.
As further improvement of the present invention; Said consolidation strategy comprises: when the value of registration in predetermined registration interval during greater than predetermined threshold value, according to the original result who inquires about with synonym of degree of correlation weight number combining of each webpage in original query result and the synonym Query Result.
Compared with prior art; The invention has the beneficial effects as follows: search engine is through judging the registration of original query result and synonym Query Result; Confirm that the probability of escape appears in the synonym Query Result, and when the escape probability is big, suppress the synonym Query Result; Avoiding the result who does not meet the user search demand to appear at the prostatitis of search result list, thereby guarantee that the user has good experience.
Description of drawings
Fig. 1 is the principle of work block diagram of first embodiment of search engine of the present invention;
Fig. 2 is the workflow diagram that search engine shown in Figure 1 excavates the synonym linguistic context;
Fig. 3 is the workflow diagram that search engine shown in Figure 1 is carried out the synonym expanding query;
Fig. 4 is the principle of work block diagram of second embodiment of search engine of the present invention;
Fig. 5 is the workflow diagram that search engine shown in Figure 4 is carried out the synonym expanding query;
Fig. 6 is the principle of work block diagram of the 3rd embodiment of search engine of the present invention;
Fig. 7 is the workflow diagram that search engine shown in Figure 6 is carried out the synonym expanding query;
Fig. 8 is the principle of work block diagram of the 4th embodiment of search engine of the present invention;
Fig. 9 is the workflow diagram that search engine shown in Figure 8 is carried out the synonym expanding query;
Figure 10 is that search engine shown in Figure 8 is judged synonym similarity grade, and synonym is carried out the workflow diagram in the embodiment of corresponding mark.
Embodiment
Below will combine each embodiment shown in the drawings to describe the present invention.But these embodiments do not limit the present invention, and the conversion on the structure that those of ordinary skill in the art makes according to these embodiments, method or the function all is included in protection scope of the present invention.
Shown in Figure 1 is the principle of work block diagram of first embodiment of search engine 100 of the present invention.In this embodiment, search engine 100 is collected webpage according to certain strategy from the internet, after webpage being organized and is handled, but browser 21 requests of customer in response the end 20 and service of search inquiry is provided.Wherein, search engine 100 can comprise and one or morely is used for storing with management data and responds the webserver entity of searching request.Client 20 can comprise one or more subscriber terminal equipments, like personal computer, notebook computer, wireless telephone, personal digital assistant (PDA) or other computer installation and communicator.
These servers and terminal device all comprise some basic modules on framework, like bus, treating apparatus, memory storage, one or more input/output device and communication interface etc.Bus can comprise one or more leads, is used for realizing each communication between components of server or terminal device.Treating apparatus comprises that all types of being used for executed instruction, the processor or the microprocessor of treatment progress or thread.Memory storage can comprise the random access storage device dynamic storagies such as (RAM) of storing multidate information and the ROM (read-only memory) static memories such as (ROM) of storing static information, and the mass storage that comprises magnetic or optical record medium and respective drive.Input media supplies user's input information to server or terminal device, like keyboard, mouse, writing pencil, voice recognition device or biometric apparatus etc.Output unit comprises and is used for display, printer, loudspeaker of output information etc.Communication interface is used for making server or terminal device and other system or device to communicate.Can be connected in the network through wired connection, wireless connections or light between the communication interface, make search engine 100,20 of clients realize mutual communication through network.Network can comprise the combination etc. of internet, the Internet or above-mentioned these networks of Local Area Network, wide area network (WAN), telephone network such as public switch telephone network (PSTN), enterprises.All include on server and the terminal device be used for management of system resource, control the operating system software of other program run, and the application software or the programmed instruction that are used for realizing the certain functional modules function.
As shown in Figure 1, search engine 100 can be carried out the synonym expanding query, and it can be divided into off-line part and online part on the whole.In the off-line part, search engine 100 comprises can store web data and synonym data repository 12, index 13, webpage grabber 14, the user inquiring log database 16 of recording user Query Information and the log analyzer 17 that daily record is analyzed to user inquiring to information.
Webpage grabber 14 is to concern to come the program that grasps webpage one by one through the hyperlink between the webpage according to certain strategy.In concrete embodiment; Webpage grabber 14 is from initial URL (Universal Resource Locator; URL) chooses URL to be crawled according to certain scheduling strategy in the storehouse; Resolve the network server address of indicating among the URL, connect, send request then and receive data, the web data that obtains is stored in the web page library 122 of data repository 12 and sets up local collection of document; From wherein extracting link, so move in circles till all URL have grasped then to carry out next step grasping movement.The scheduling strategy that webpage grabber 14 is chosen URL institute foundation can comprise that breadth-first strategy, depth-first strategy, backward chaining count strategy etc.; Grasp Modes can be that the accumulation formula grasps, and also can be that increment type grasps.Index 13 is used for index is analyzed and set up to local collection of document.For example from the full text of document, extract entry through participle; Remove by filter high frequency words or low-frequency word then; To obtain the index terms set; At last webpage is converted into the mapping of index terms to webpage to the mapping of index terms, forms and comprise the inverted file of index thesaurus and inverted list and be stored in the index database 121 of data repository 12.The method that web document is carried out participle comprises segmenting method based on dictionary, based on the segmenting method of understanding with based on the segmenting method of statistics.The wherein more common segmenting method based on dictionary comprises the maximum syncopation of forward, reverse maximum syncopation and minimum syncopation again.
In the present invention, synonym is meant the equivalent in meaning or close entry that title is different but express, and the identical or close meaning expressed in promptly a plurality of entries, then these entries synonym each other.In this embodiment, thesaurus 123 comprises synonym correspondence table 1231 and synonym context bank 1232.Different words and its synon corresponding relation have wherein been specified in the synonym correspondence table 1231 in advance, like former speech and its synon mapping table through statistics acquisition in advance.This correspondence table can also constantly be upgraded through the historical query click data of analysis user.For example, comprises the synonym of certain former speech in by the title of the queried result website clicked but former speech do not occur, and the frequency that this situation occurs is higher, then should former speech and synonym confirm as synonym to and be added in the synonym correspondence table 1231.
Shown in Figure 2 is the workflow that search engine 100 excavates an embodiment of the right synonym linguistic context of synonym.In the present invention, the synonym linguistic context is meant the semantic environment that synonym occurs the Central Plains speech, and it is used for showing the semantic environment of this synonym to being suitable for, and promptly under this semantic environment, synonym is fit to the former speech of replacement and carries out the synonym expanding query.In this embodiment, the synonym linguistic context obtains through the analysis user inquiry log.User inquiring log database 17 is after each search finishes, and is used for the inquiry click data of recording user, like query word expression formula, search time, the results list that returns and the results web page clicked etc.Shown in Figure 1 with reference to Fig. 2 and cooperation reference; The user inquiring formula and the click data (step 411) of the history that comprises in the log analyzer 17 analysis user inquiry log databases 16 comprise the query formulation of analysis of history and the queried result website of returning in response to the ad hoc inquiry formula and clicked visit.Next, whether log analyzer 17 can exist the right synonym linguistic context of certain synonym in these data of identification, if then write down and store in the synonym context bank 1232.
Particularly, log analyzer 17 at first can judge in a certain historical query formula whether comprise former speech based on synonym correspondence table 1231, if then obtain and comprise this former speech and corresponding synon synonym is right.For example; The historical query formula is " how fish-flavoured shredded pork is cooked "; Log analyzer 17 is judged the former speech that there is " how doing " in these query formulation (with " how fish-flavoured shredded pork is cooked " cutting is " fish-flavoured shredded pork " and " how doing " two entries, then the former speech in these two entries and the synonym correspondence table is mated, thereby finds the former speech of " how doing ") based on synonym correspondence table 1231; And obtained corresponding synonym to { " how doing ", " menu " }.Subsequently, log analyzer 17 judges to this query formulation, and the user clicks and whether comprised synonym in the web page title of visit but do not comprise former speech, if then write down the right synonym linguistic context of this synonym.For example, to query formulation " how fish-flavoured shredded pork is cooked ", the user clicked title and was the webpage of " fish-flavoured shredded pork menu ", the operation that then log analyzer 17 will executive logging synonym linguistic context.The synonym linguistic context comprises this historical query formula at least, like " how fish-flavoured shredded pork is cooked "; Next-door neighbour's speech that also can comprise this historical query formula Central Plains speech is like " fish-flavoured shredded pork "; Or both's record is as the synonym linguistic context of synonym to { " how doing ", " menu " }.Wherein, the next-door neighbour be before speech can be positioned at former speech, also can be positioned at former speech after; Next-door neighbour's speech also can be the empty word bar, promptly only comprises former speech in the original query, does not have next-door neighbour's speech.
In the above-mentioned embodiment, the synonym linguistic context is the user behavior acquisition through history, but in other embodiments, the synonym linguistic context also can be confirmed according to the anchor text of webpage.The anchor text is the text message that comprises in the hyperlink of webpage.For example; The ultra chain text in the place that webpage www.sina.com.cn is cited has " Sina website's homepage ", " Sina's homepage ", " sina homepage "; These several literal sections can go on record and be used as the synonym linguistic context of synonym to { " Sina website ", " Sina " } so.In addition, the synonym linguistic context also can be confirmed according to arranged side by side section in the web page title.For example, price.mycar168.com/search.asp? The title of this network address of factoryid=135 is " quotation of Huachen BMW, automobile big world, a Huachen BMW price Shenzhen net ".Then pass through separator; This title can be a plurality of entry fragments arranged side by side " quotation of Huachen BMW " " Huachen BMW price " " automobile big world, Shenzhen net " by cutting; And preceding two fragments comprise synonym to { " price "; " quotation " } in " price " and " quotation ", these two fragments also can be used as the right synonym linguistic context of this synonym so.
With reference to shown in Figure 2; In the process that the synonym linguistic context is excavated, user's click behavior might not all be fully reasonably, that is to say; The user may not be in the mood for clicking some incoherent results in the process of navigate search results, the synonym linguistic context of record just can not be accurate in this case.Think and eliminate the negative effect that this situation causes; Log analyzer 17 also can be added up the frequency that the synonym linguistic context is write down; And, to have only when the frequency during more than or equal to a predetermined frequency threshold value, this synonym linguistic context just can keep confirms as the right synonym linguistic context of corresponding synonym; That is to say, filter out the synonym linguistic context (step 413) of low frequency.
As shown in Figure 1, the online part of search engine 100 mainly comprises search component 11 and user interface 15.Wherein user interface 15 represents through the browser software 21 of client 20, is used to supply the user input query formula, and by predetermined ways of presentation display of search results tabulation; In addition, after search finishes, also be used for the Query Information of recording user, and it is deposited in the user inquiring log database 16.Search component 11 is used for the searching request of customer in response end 30, and Search Results is returned to client 20.In this embodiment, search component 11 comprises search module 111, query analysis module 112 and synthesis module 113 as a result.For common original query (not comprising expanding query), query analysis module 112 is generally used for the current original query that receives is carried out the participle operation, obtains the query word set, and the generated query vocabulary.Search module 111 matees with index thesaurus in the data directory storehouse 121 after receiving the inquiry vocabulary, finds the corresponding inverted list of corresponding index terms and each index terms, gathers thereby obtain the web document relevant with query word.Synthesis module 113 with the web document series arrangement that searches, returns to client with the results list through user interface 15 according to the degree of correlation weights between predetermined each document and the query word then as a result.
Below in conjunction with workflow shown in Figure 3 the detailed step of search engine 100 according to the online execution synonym of synonym linguistic context expanding query is described.Query analysis module 112 receives the original query (step 421) that the active user searches for through user interface 15, and analysis and consult formula (step 422) comprises original query is carried out the participle operation then.Need to prove that the segmenting method in this embodiment is based on the maximum syncopation of forward of dictionary, and this dictionary is formed by the entry fragment structure that the synonym linguistic context comprises.Before address; The historical query formula can be used as the synonym context record; And the fragment length of historical query formula is greater than the length of the entry of this query formulation after by cutting, so, adopt the maximum syncopation of forward can guarantee in case comprise the fragment of historical query formula in the current original query; Then this fragment can be taken the lead in cutting out, thereby has improved the accuracy rate of follow-up calculating.For example; In synonym linguistic context excavation phase; " today Nokia how much " the historical query formula be; Then at the record synonym during to the synonym linguistic context of { " how much ", " price " }, historical query formula " today Nokia how much " be close to speech " Nokia " and all can note as the synonym linguistic context.And " who know today Nokia how much " current original query be; According to the maximum syncopation of forward; The longest fragment in the synonym context lexicon " today Nokia how much " length is 8, and then query analysis module 112 from left to right scans current original query, judges length is whether 8 phrase appears in the synonym context lexicon; When finding " today Nokia how much " coupling; Will it be cut out earlier, so, " Nokia " just can not cut out as independent keyword.In step 422; Query analysis module 112 also can be with entry set that obtains after the original query cutting and thesaurus 123 couplings; Obtain potential synonym to the synonym linguistic context right with this synonym; This potential synonym centering has comprised and has been present in the former speech that comprises in the original query, and the synonym corresponding with this former speech.
Next, query analysis module 112 judges whether synonym linguistic context and original query mate (step 423).In this embodiment; Query analysis module 112 can be calculated the matching degree of synonym border and original query; When the value of matching degree is in the predetermined matching degree interval; Then confirm synonym linguistic context and original query coupling, the semantic environment that promptly shows current original query is fit to the employing synonym and replaces former speech and carry out expanding query.The calculating of matching degree can be removed the length behind the former speech according to former speech beginning query formulation, and the length of synonym linguistic context is confirmed.Below be in this embodiment, when the length of original query greater than the length of former speech (when being q ≠ orig), the computing formula of matching degree M:
M ( orig , syn ) = Σ i = 1 n TermCount ( p i ) TermCount ( q ) - TermCount ( orig ) , q ≠ orig
Wherein TermCount (q) representes the length of original query, the length of TermCount (orig) expression original query Central Plains speech, the length of i synonym linguistic context of TermCount (pi) expression.Because in this case, can have the speech of non-synonym linguistic context in the original query, so M is the value that is between [0,1].Preestablish a matching degree threshold value θ; Then the value as M is in [θ; 1] time, show synonym linguistic context and original query coupling, the former speech of then synonym being replaced in the original query is inquired about to obtain synonym; Search module 111 obtains original query result's the collections of web pages and the set (step 424) of synonym queried result website according to original query and synonym query search subsequently, and synthesis module 113 merges result's (step 425) of original query and synonym inquiry according to predetermined consolidation strategy as a result.About consolidation strategy as a result, will do detailed description below.When the value of M is in [0; θ] time; Show that synonym linguistic context and original query do not match, promptly under this semantic environment, be not suitable for substituting former speech with synonym; Next 111 meetings of search module are carried out the collections of web pages (step 426) of searching for and obtaining the original query result according to original query, and then synthesis module 113 obtains search result list (step 425) according to the degree of correlation weights between predetermined each webpage and the original query as a result.When original query only comprises former speech (being q=orig), matching degree M=1, then with replacing original query between the synonym, then execution in step 424 and step 425.
Search engine is through the semantic environment analysis to active user's query demand; To determine whether that being suitable for the synonym conversion carries out the synonym expanding query; Thereby guarantee the accuracy rate of synonym expanding query, make expanding query meet user's demand as far as possible, and then guarantee that the user has good experience.
Fig. 4 and Fig. 5 have disclosed second embodiment of search engine of the present invention.Compare first embodiment, the search engine 200 of this embodiment is main through judging the escape probability of synonym Query Result, adjusts the synonym Query Result and in the end represents to the position in user's the search result list.As shown in Figure 4, search engine 200 comprises search component 11, data repository 12, index 13, grabber 14 and user interface 15.Functional modules such as data repository 12, index 13, grabber 14 and user interface 15 and above-mentioned embodiment are basic identical, so the applicant no longer gives unnecessary details at this.In this embodiment, search component 11 comprises that search module 111, query analysis module 112 and registration calculate and the result merges module 114.
Below in conjunction with Fig. 5 the search engine of this embodiment being carried out the synonym expanding query elaborates.At first, query analysis module 112 receives user's original query (step 431).Next; Analysis and consult formula (step 432); Comprise original query is carried out the participle operation to obtain the query word set, it is right with its synon synonym also to obtain to comprise former speech based on the former speech in the thesaurus 123 identification original query, and directly synonym is replaced former speech to obtain the synonym inquiry.Search module 111 obtains original query result's the collections of web pages and the set (step 433) of synonym queried result website according to original query and synonym query search.Next, registration calculating and result merge the registration (step 434) that module 114 is calculated webpage in original query results and the synonym Query Result.This registration mainly is the quantity that is used for reacting the middle same web page of original query result and synonym Query Result; If the quantity of same web page is abundant; Show that synonym Query Result and original query result are more approaching, it is less that then the probability of escape appears in the synonym Query Result; Otherwise it is bigger to show that then the probability of escape appears in the synonym Query Result, need suppress the prostatitis that appears at the results list with the result who avoids not meeting the user search demand to the synonym Query Result.
The calculating of registration can be adopted multiple mode, as only calculating the quantity of the webpage that overlaps in original query result and the synonym Query Result | and U1 ∩ U2|, promptly confirm identical URL quantity; Or calculate each preceding 100 result's in two results sets coincidence webpage quantity, compare judgement with predetermined threshold value then.As preferred mode, the calculating of registration also comprise less in the webpage quantity of the webpage quantity of confirming a original query result and synonym Query Result Min (| U1|, | U2|); Then registration I (U1, U2)=| U1 ∩ U2|/Min (| U1|, | U2|).Perhaps in other embodiments, the calculating of registration also comprises the summation of the webpage quantity of the webpage quantity of calculating the original query result and synonym Query Result | U1 ∪ U2|; Then registration I (U1, U2)=| U1 ∩ U2|/| U1 ∪ U2|.After the value of registration is calculated; Can judge this value whether be in to determine whether the suppressing synonym Query Result in predetermined the registration interval in, then confirm the result's (step 435) after merging is also exported in the position of synonym Query Result in search result list.With registration account form I (U1, U2)=| U1 ∩ U2|/Min (| U1|, | U2|) be example, the value I of registration is for being in the floating number between [0,1].Preestablish a registration threshold value σ; Then work as I and be in [σ; 1] time, shows that the registration of original query result and synonym Query Result is higher, in this case; Need not suppress the synonym Query Result, only need to merge result original and the synonym inquiry according to the degree of correlation weights of predetermined each webpage.When I is in [0, σ], show that the registration of original query result and synonym Query Result is lower, the escape probability of synonym Query Result is bigger, at this moment just need suppress the synonym Query Result.The mode of suppressing can be the degree of correlation weights of webpage in the synonym Query Result to be done fall the power processing, thereby makes the position after being in the search result list of synonym Query Result after merging; Perhaps the synonym Query Result is inserted into after the specific page of search result list, as the synonym Query Result being adjusted to second page of search result list; In addition, also can the synonym Query Result be adjusted to original query result's back, promptly the synonym Query Result appears at search result list backmost.
Search engine is through judging the registration of original query result and synonym Query Result; Confirm that the probability of escape appears in the synonym Query Result; And when the escape probability is big; Suppress the synonym Query Result, appear at the prostatitis of search result list to avoid the result who does not meet the user search demand, thereby guarantee that the user has good experience.In this embodiment; Replace former speech execution synonym expanding query with synonym before, the synonym linguistic context of passing through that must not adopt embodiment one to be mentioned judges that determining whether to carry out synonym replaces, yet; What those of ordinary skill in the art can expect easily is; If this embodiment combines first embodiment, promptly before the synonym replacement, carry out the judgement of synonym linguistic context earlier,, the synonym Query Result merges Search Results after coming out then according to original registration with the synonym Query Result; Obviously can obtain Search Results more accurately like this, thereby further promote user experience.
Fig. 6 and Fig. 7 have disclosed the 3rd embodiment of search engine of the present invention.This embodiment is based on the synonym Query Result; The escape probability that further distributes and judge the synonym Query Result through the semantic topic of analyzing the synonym queried result website, and then adjustment synonym Query Result in the end represents to the position in user's the search result list.As shown in Figure 6, similar with first embodiment, search engine 300 comprises search component 11, data repository 12, index 13, grabber 14, user interface 15, user inquiring log database 16, log analyzer 17.Wherein identical in functional modules such as index 13, grabber 14, user interface 15, user inquiring log database 16, log analyzer 17 and first embodiment, the applicant no longer gives unnecessary details at this.In this embodiment, search component 11 comprises search module 111, query analysis module 112, synthesis module 113 and escape determination module 115 as a result.Data repository 12 includes index database 121, web page library 122, thesaurus 123 and web page semantics theme storehouse 124.Wherein identical in index database 121, web page library 122, thesaurus 123 and first embodiment, the applicant no longer gives unnecessary details at this.Search engine 300 also comprises a subject analysis module 18, and in this embodiment, this subject analysis module 18 comprises a probability latent semantic analysis (Probabilitistic Latent SemanticAnalysis calls PLSA in the following text) model.
The PLSA model is a kind of instrument of natural language processing, and it is mainly used in the potential semanteme of analytical documentation.A document can be represented as the set of one group of speech, but because synon existence, and speech is not the basic composition element of document, so, can think between speech and document, to also have a potential semantic level, i.e. theme.For example; The query formulation of user's input is " the green color of Swiss Army Knife "; Because { " green color "; " green " } be that synonym is right, thus can use " green " replacement " green color " to carry out the synonym expanding query, but the possibility of result of at this moment recalling can comprise the webpage of title for " system's Swiss Army Knife-perfection unloading V2007 green edition ".This be because " the green color of Swiss Army Knife " corresponding theme as " article ", and " system's Swiss Army Knife-perfection unloading V2007 green edition " corresponding theme as " software ", obviously, search engine also can't be understood these themes that implies.The PLSA model is the topic model that potential semantic topic is analyzed in a kind of distribution through co-occurrence word in the calculating document, and it introduces a potential semantic layer between document and speech, and this potential semantic layer is made up of n potential semantic topic.Suppose between document and the speech it is separate, then the common probability that occurs of document and speech is decided by the probabilistic relation between they and the theme.Therefore, can calculate the relation between document or speech and the potential semantic topic through the PLSA model.Based on this, can obtain the semantic topic distribution of synonym linguistic context and synonym queried result website through the PLSA model in this embodiment, and calculate the escape probability of both matching degrees with definite synonym Query Result.Next will describe in detail.
As shown in Figure 6, subject analysis module 18 is obtained webpage from web page library 122, removes the noise words such as frame advertisement in the webpage, extracts the keyword set that can represent this webpage then.Subsequently, subject analysis module 18 obtains the vectorial S2={s21 of webpage-potential semantic topic of the semantic topic distribution of this webpage of expression through the PLSA Model Calculation, s22 ..., s2n}, wherein s2n representes the probability score of this webpage on n semantic topic.In this embodiment, obtaining that the web page semantics theme distributes is under off-line state, to obtain, i.e. subject analysis module 18 is analyzed all crawled webpages, obtains its semantic topic and distributes, and is stored into then in the web page semantics theme storehouse 124.Certainly, this process also can be under the state of on-line search, to obtain, and promptly after the synonym Query Result obtains, subject analysis module 18 is the webpage among the analysis and consult result only, gives escape determination module 115 with the semantic topic distribution of these webpages then and judges.In this embodiment, obtaining that synonym linguistic context semantic topic distributes is canbe used on line.After query analysis module 112 cutting original query obtained keyword set, subject analysis module 18 was obtained this keyword set, and from synonym context bank 1232, obtained the entry set that corresponding synonym linguistic context comprises.Then; The entry of keyword set and synonym linguistic context is lumped together; Give the synonym linguistic context-potential semantic topic vector S1={s11 of the semantic topic distribution of PLSA Model Calculation and this synonym linguistic context of acquisition expression; S12 ... s1n}, wherein s1n is meant the probable value of synonym linguistic context on n semantic topic.After obtaining vectorial S1, subject analysis module 18 is given the similarity that escape determination module 115 is judged S1 and S2 with it.About the step of judging, will do describing in detail at the back literary composition.
Next will cooperate Fig. 7 to introduce the detailed step of search engine 300 execution synonym expanding queries in this embodiment in detail.At first, query analysis module 112 receives the original query (step 441) of user search, then this original query is analyzed (step 442).Query analysis module 112 can be carried out the participle operation to original query, and as first embodiment, the participle operation is based on the dictionary of synonym linguistic context structure and does the maximum forward cutting.After the participle operation, obtain the primary keys set, on the one hand, query analysis module 112 carries out original query (step 449) for search module 111 the primary keys set intersection, and obtains original query result (step 450).On the other hand, query analysis module 112 is discerned the former speech that comprises in the original query based on thesaurus 123, and obtains corresponding potential synonym to reaching the right synonym linguistic context of this potential synonym.Analysis and consult module 112 can directly be replaced former speech with synonym and inquire about to obtain synonym, and give search module 111 and carry out synonym expanding queries (step 443) after being obtained above-mentioned data.In preferred embodiment, before carrying out the synonym replacement operation, can judge whether to meet the synonym linguistic context of former speech earlier, if meet, carry out the operation of synonym replacement again, so can further improve the accuracy rate of synonym Query Result.About judge the operation of carrying out synonym replacement according to the matching degree of synonym linguistic context, describe in detail in the first embodiment, the applicant this no longer semanteme give unnecessary details.In addition, query analysis module 112 is also given subject analysis module 18 with the primary keys set intersection, is distributed (step 447) through PLSA Model Calculation and the semantic topic that obtains the synonym linguistic context by it, and result calculated is given escape determination module 115.
After search module 111 is carried out synonym inquiry acquisition synonym Query Result (step 444); Escape determination module 115 distributes according to the synonym Query Result obtains results web page from web page semantics theme storehouse semantic topic; Be the vectorial S2={s21 of webpage-potential semantic topic; S22 ..., s2n} (step 445).On the other hand; Escape determination module 115 distributes from the semantic topic that the subject analysis module has obtained the synonym linguistic context, i.e. the vectorial S1={s11 of synonym linguistic context-potential semantic topic, s12; ... s1n}; Next, escape determination module 115 is judged the matching degree that two semantic topics distribute, and promptly calculates the similarity (step 446) of two vectorial S1, S2; Then filter synonym Query Result (step 448), promptly confirm the mode of suppressing of synonym Query Result, and merge the result of original query and synonym inquiry in view of the above, generate search result list (step 451) according to matching degree.About the similarity of two vectors calculate have multiple, like inner product similarity, cosine similarity etc.It below is the example of utilizing calculation of similarity degree formula between cosine similarity compute vector S1 and the S2.
sim ( S 1 , S 2 ) = Σ i = 1 n s 1 i * s 2 i Σ j = 1 n s 1 i 2 Σ j = 1 n s 2 i 2
If the value of the similarity of calculating is very high, show that this webpage and synonym linguistic context probability on n semantic topic is all very big, can judge that then two semantic topic distribution matching degrees are high, promptly the escape probability of this webpage is less; Otherwise, if the value of the similarity of calculating is very low, showing that the escape probability of this webpage is bigger, so just need suppress this result.Particularly, (S1 is S2) for being in the floating number between [0,1] for the value sim of similarity.Can preestablish a threshold alpha, then (S1 S2) is in [α as sim; 1] time, shows that the matching degree of two semantic topics distributions is higher, in this case; Need not suppress the synonym Query Result, only need to merge result original and the synonym inquiry according to the degree of correlation weights of predetermined webpage.(S1 when S2) being in [0, α], shows that the matching degree of two semantic topics distributions is lower, and the escape probability of synonym Query Result is bigger, at this moment just need suppress the synonym Query Result as sim.The mode of suppressing can be the degree of correlation weights of synonym queried result website to be done fall the power processing, thereby makes the position after being in the search result list of synonym Query Result after merging; Perhaps the synonym Query Result is inserted into after the specific page of search result list, as the synonym Query Result being adjusted to second page of search result list; Also can the synonym Query Result be adjusted to original query result's back in addition, promptly the synonym Query Result appears at search result list backmost.
The matching degree that search engine distributes through the semantic topic that compares synonym linguistic context and synonym queried result website; Can judge whether the synonym Query Result satisfies user's potential demand; Thereby can correspondingly control the ordering of synonym Query Result in whole search result list in view of the above; Avoiding the escape result occurring, and then guarantee that the user has good experience in the prostatitis of Search Results.The PLSA model of in above-mentioned embodiment, introducing; Other topic model also can be used for analyzing the potential semantic topic of synonym linguistic context and synonym queried result website; Like latent semantic analysis (Latent Semantic Analysis; LSA) model or potential Di Li Cray distribute (Latent Dirichlet Allocation, LDA) model etc.
Fig. 8 to Figure 10 has disclosed the 4th embodiment of search engine of the present invention.This embodiment mainly is the synon ways of presentation of describing in the Search Results.The principle of work block diagram of search engine 400 as shown in Figure 8, it comprises search component 11, data repository 12, index 13, grabber 14 and user interface 15.Functional modules such as data repository 12, index 13, grabber 14 and user interface 15 and above-mentioned embodiment are basic identical, so the applicant no longer gives unnecessary details at this.In this embodiment, search component 11 comprise search module 111, query analysis module 112, as a result synthesis module 113, be used to analyze the analysis module 116 of synonym and former speech similarity grade and the labeling module 117 of definite synonym ways of presentation.
Below in conjunction with Fig. 9 the search engine of this embodiment being carried out the synonym expanding query elaborates.At first, query analysis module 112 receives the original query (step 461) of user search, then this original query is analyzed (step 462).Query analysis module 112 can be carried out the participle operation to original query, to obtain the primary keys set.Query analysis module 112 is based on the former speech that comprises in the thesaurus 123 identification original query, and acquisition comprises this former speech and synon synonym is right.On the one hand, analysis and consult module 112 usefulness synonyms are replaced former speech to obtain the synonym inquiry, and search module 111 is carried out original query and synonym expanding query (step 463) according to original query and synonym inquiry subsequently.Search module 111 is after obtaining original query result and synonym Query Result, and transferring to as a result, synthesis module 113 merges and generation search result list (step 464).About merging method original and the synonym inquiry, describe in detail in the above-mentioned embodiment, the applicant no longer gives unnecessary details at this.On the other hand, query analysis module 112 to giving similarity grade analysis module 116, is judged the similarity grade (step 465) of synonym and former speech by it with synonym, and gives labeling module 117 with judged result.Next, labeling module 117 is confirmed synon exhibition method according to the judged result of similarity grade, and at last will mark the search result list of getting well through user interface 15 and represent to the user (step 466).
Below in conjunction with Figure 10 the similarity grade of synonym and former speech is judged and correspondingly exhibition method further illustrate.Similarity grade analysis module 116 is obtained synonym to (step 471) from query analysis module 112, judges at first whether the synonym of synonym centering and former speech belong to high similarity grade (being the higher the first estate of similarity grade) (step 472).In this embodiment; The situation that synonym and former speech belong to high similarity grade comprises proper noun abbreviation (like " Peking University " and " Beijing University ", " Sina website " and " sina ") or digital conversion (like " the 5th collection " and " the 5th collection ") or region speech conversion (like " Beijing " and " Beijing ") etc.If belong to high similarity grade, then synonym is carried out the mark (step 473) of particular color, this particular color is more eye-catching color usually, like the redness in this embodiment; If do not belong to, judge next then whether synonym centering synonym and former speech belong to middle similarity grade (being junior second grade of similarity) (step 474).In this embodiment, the judgement of similarity grade comprises the judgement of semantic similarity or morphology similarity in former speech and the synonym.
Below be the concrete example of semantic similarity computing formula:
SSim ( orig , syn ) = ClickQueryCount ( orig , syn ) QueryCount ( orig ) ,
Wherein ClickQueryCount (orig, syn) expression comprises former speech orig in the query formulation, clicks the historical query quantity that does not comprise former speech orig in the title of webpage of visit but comprise synonym syn simultaneously; The historical query quantity that comprises former speech orig in QueryCount (orig) the expression query formulation.For example; The historical query formula of user's input is " Beijing University where "; Clicked title in the Search Results webpage for " Peking University where " then, so current inquiry will be accumulated to ClickQueryCount, and (orig is syn) and on the QueryCount (orig); And if the user has just clicked the webpage of the title in the Search Results for " Beijing University where " for historical query formula " Beijing University where ", then current inquiry only can be accumulated on the QueryCount (orig).Obviously, the value of semantic similarity is for being in the floating number between [0,1].Can preestablish a threshold value beta, then when the value of semantic similarity is in [β, 1], show that former speech and synonym belong to middle similarity grade; And when the value of semantic similarity is in [0, β], then next also will carry out the judgement of morphology similarity.If confirmed that this synonym to similarity grade in belonging to, then carries out the mark (step 475) of specific font to synonym,, in this embodiment runic like runic or italic.
Below be the concrete example of morphology calculating formula of similarity:
WSim ( orig , syn ) = CoocAlphaCount ( orig , syn ) AllAlphaCount ( orig , syn )
Wherein CoocAlphaCount (orig, syn) with synonym syn what words to be arranged be the same to the former speech orig of expression, (orig syn) representes to comprise among former speech orig and the synonym syn sum of different words to AllAlphaCount.For example: for synonym to { " how ", " how " }, CoocAlphaCount (" how ", " how ")=2 why " " and " " these two words appear in former speech and the synonym simultaneously because synonym centering; AllAlphaCount (orig, syn)=3, why because synonym centering one has 3 different words " " " " " appearance ".For English, the quantity of statistics letter then, for example: for synonym to { " man ", " men " }, CoocAlphaCount (" man ", " men ")=2, and AllAlphaCount (" man ", " men ")=4.Obviously, the value of morphology similarity also is the floating number that is between [0,1].Can preestablish a threshold gamma, when the value of semantic similarity is in [γ, 1], show that former speech and synonym belong to middle similarity grade, then 117 pairs of synonyms of labeling module are marked slightly; And when the value of semantic similarity is in [0, γ], show that this synonym centering synonym and former speech belong to low similarity grade (being the similarity grade tertiary gradient lower than second grade), thereby synonym does not carry out any mark (step 476).With respect to the mark of particular color, it is weak that the boldness of specific font is wanted, but still can cause user's concern, thus the synonym of similarity grade in being applicable to, though because variation has taken place for its semanteme or morphology, still more approaching with former speech; And the synonym of low similarity grade is because semanteme or morphology and former speech gap are bigger, if mark can bring lofty sense to the user; So preferred mode is not carry out marking.
The similarity grade of search engine through discriminating synonyms with former speech come mark that the synonym in the Search Results is adapted, thereby avoids bringing lofty sense to the user when the user locatees information needed fast supplying, and then promotes user's experience.
What those skilled in the art can expect easily is, mode that the judgment mode of synonym similarity grade, synonym are showed and different similarity grade are with described in the corresponding relation of different exhibition methods is not limited in above-mentioned embodiment.For example, can also judge the similarity grade, perhaps synonym carried out the mark mode of Gao Liang through editing distance.In addition, the similarity grade can be provided with more, as semantic similarity and morphology similarity are split as two different grade.Certainly, also can reduce the similarity grade, be about to all synonyms and only classify as high similarity grade or low similarity grade.Belong to proper noun abbreviation, digital conversion or region speech conversion as working as synonym and former speech; Perhaps former speech and synon semantic similarity, morphology similarity or editing distance can be thought high similarity grade during more than or equal to assign thresholds, and all the other then are low similarity grade.
Be to be understood that; Though this instructions is described according to embodiment; But be not that each embodiment only comprises an independently technical scheme, this narrating mode of instructions only is for clarity sake, and those skilled in the art should make instructions as a whole; Technical scheme among each embodiment also can form other embodiments that it will be appreciated by those skilled in the art that through appropriate combination.
The listed a series of detailed description of preceding text only is specifying to feasibility embodiment of the present invention; They are not in order to restriction protection scope of the present invention, allly do not break away from equivalent embodiment or the change that skill of the present invention spirit done and all should be included within protection scope of the present invention.

Claims (12)

1. the implementation method of a search engine is characterized in that, this method comprises the steps:
Receive the original query of user search;
Analyze said original query, obtaining the former speech that is present in the original query and the synonym of this former speech, and said synonym substituted former speech in the original query to obtain the synonym inquiry;
According to said original query and synonym query search and obtain original query results web page set and the set of synonym queried result website;
Calculate the registration of webpage in said original query result and the synonym Query Result;
Merge the results web page set of original query and synonym inquiry according to the predetermined consolidation strategy corresponding, and generate search result list with said registration.
2. the implementation method of search engine according to claim 1 is characterized in that, the calculating of said registration comprises the quantity of calculating the webpage that overlaps in original query result and the synonym Query Result | U1 ∩ U2|.
3. the implementation method of search engine according to claim 2 is characterized in that, the calculating of said registration also comprise less in the webpage quantity of the webpage quantity of confirming a original query result and synonym Query Result Min (| U1|, | U2|); Said registration I (U1, U2)=| U1 ∩ U2|/Min (| U1|, | U2|).
4. the implementation method of search engine according to claim 2 is characterized in that, the calculating of said registration also comprises the summation of the webpage quantity of the webpage quantity of calculating the original query result and synonym Query Result | U1 ∪ U2|; Said registration I (U1, U2)=| U1 ∩ U2|/| U1 ∪ U2|.
5. the implementation method of search engine according to claim 1; It is characterized in that; Said consolidation strategy comprises: when the value of said registration in predetermined registration interval during less than predetermined threshold value; Said predetermined consolidation strategy is for suppressing processing when merging the synonym Query Result, said suppress to handle comprise:
Reduce the degree of correlation weights of webpage in the synonym Query Result; Perhaps
The synonym Query Result is inserted into after the specific page of search result list; Perhaps
The synonym Query Result is adjusted to original query result's back.
6. the implementation method of search engine according to claim 1; It is characterized in that; Said consolidation strategy comprises: when the value of registration in predetermined registration interval during greater than predetermined threshold value, according to the original result who inquires about with synonym of degree of correlation weight number combining of each webpage in original query result and the synonym Query Result.
7. a search engine is characterized in that, this search engine comprises search component, and search component comprises:
The query analysis module is used to receive the original query of user search; Analyze said original query, obtaining the former speech that is present in the original query and the synonym of this former speech, and said synonym substituted former speech in the original query to obtain the synonym inquiry;
Search module is used for according to said original query and synonym query search and obtains the set of original query results web page gathering with the synonym queried result website;
Registration calculates and the result merges module, is used for calculating the registration of said original query result and synonym Query Result webpage; And according to the predetermined consolidation strategy merging original query corresponding and the results web page set of synonym inquiry, and generate search result list with said registration.
8. search engine according to claim 7 is characterized in that, the calculating of said registration comprises the quantity of calculating the webpage that overlaps in original query result and the synonym Query Result | U1 ∩ U2|.
9. search engine according to claim 8 is characterized in that, the calculating of said registration also comprise less in the webpage quantity of the webpage quantity of confirming a original query result and synonym Query Result Min (| U1|, | U2|); Said registration I (U1, U2)=| U1 ∩ U2|/Min (| U1|, | U2|).
10. search engine according to claim 8 is characterized in that, the calculating of said registration also comprises the summation of the webpage quantity of the webpage quantity of calculating the original query result and synonym Query Result | U1 ∪ U2|; Said registration I (U1, U2)=| U1 ∩ U2|/| U1 ∪ U2|.
11. search engine according to claim 7; It is characterized in that; Said consolidation strategy comprises: when the value of said registration in predetermined registration interval during less than predetermined threshold value; Said predetermined consolidation strategy is for suppressing processing when merging the synonym Query Result, said suppress to handle comprise:
Reduce the degree of correlation weights of webpage in the synonym Query Result; Perhaps
The synonym Query Result is inserted into after the specific page of search result list; Perhaps
The synonym Query Result is adjusted to original query result's back.
12. search engine according to claim 7; It is characterized in that; Said consolidation strategy comprises: when the value of registration in predetermined registration interval during greater than predetermined threshold value, according to the original result who inquires about with synonym of degree of correlation weight number combining of each webpage in original query result and the synonym Query Result.
CN201110079699.1A 2011-03-31 2011-03-31 Search engine and implementation method thereof Active CN102722499B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110079699.1A CN102722499B (en) 2011-03-31 2011-03-31 Search engine and implementation method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110079699.1A CN102722499B (en) 2011-03-31 2011-03-31 Search engine and implementation method thereof

Publications (2)

Publication Number Publication Date
CN102722499A true CN102722499A (en) 2012-10-10
CN102722499B CN102722499B (en) 2015-07-01

Family

ID=46948266

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110079699.1A Active CN102722499B (en) 2011-03-31 2011-03-31 Search engine and implementation method thereof

Country Status (1)

Country Link
CN (1) CN102722499B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104156391A (en) * 2014-07-09 2014-11-19 北京奇虎科技有限公司 Device and method for displaying menus in mobile search results
CN105659235A (en) * 2016-01-08 2016-06-08 马岩 A term searching method for network information and a system thereof
CN105989125A (en) * 2015-02-16 2016-10-05 苏宁云商集团股份有限公司 Searching method and system for carrying out label identification on resultless word
CN106250516A (en) * 2016-08-03 2016-12-21 王晓光 Synonym application process in big data search and system
CN106294784A (en) * 2016-08-12 2017-01-04 合智能科技(深圳)有限公司 Resource search method and device
WO2017166132A1 (en) * 2016-03-30 2017-10-05 马岩 Network information pushing method and system
WO2018023481A1 (en) * 2016-08-03 2018-02-08 王晓光 Method and system for applying synonym in big data search
CN107729347A (en) * 2017-08-23 2018-02-23 北京百度网讯科技有限公司 Acquisition methods, device, equipment and the computer-readable recording medium of synonymous label
CN110196941A (en) * 2018-07-24 2019-09-03 腾讯科技(深圳)有限公司 A kind of information recommended method, device, server and storage medium
CN111666417A (en) * 2020-04-13 2020-09-15 百度在线网络技术(北京)有限公司 Method and device for generating synonyms, electronic equipment and readable storage medium
CN116344012A (en) * 2023-05-29 2023-06-27 北京梆梆安全科技有限公司 Diagnosis and treatment log medical management system of (a)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1873642A (en) * 2006-04-29 2006-12-06 上海世纪互联信息系统有限公司 Searching engine with automating sorting function
CN101241512A (en) * 2008-03-10 2008-08-13 北京搜狗科技发展有限公司 Search method for redefining enquiry word and device therefor
CN101576916A (en) * 2009-06-18 2009-11-11 清华大学 Method and device for obtaining synonyms
CN101645082A (en) * 2009-04-17 2010-02-10 华中科技大学 Similar web page duplicate-removing system based on parallel programming mode
CN101878476A (en) * 2007-06-22 2010-11-03 谷歌公司 Machine translation for query expansion

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1873642A (en) * 2006-04-29 2006-12-06 上海世纪互联信息系统有限公司 Searching engine with automating sorting function
CN101878476A (en) * 2007-06-22 2010-11-03 谷歌公司 Machine translation for query expansion
CN101241512A (en) * 2008-03-10 2008-08-13 北京搜狗科技发展有限公司 Search method for redefining enquiry word and device therefor
CN101645082A (en) * 2009-04-17 2010-02-10 华中科技大学 Similar web page duplicate-removing system based on parallel programming mode
CN101576916A (en) * 2009-06-18 2009-11-11 清华大学 Method and device for obtaining synonyms

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104156391A (en) * 2014-07-09 2014-11-19 北京奇虎科技有限公司 Device and method for displaying menus in mobile search results
CN105989125A (en) * 2015-02-16 2016-10-05 苏宁云商集团股份有限公司 Searching method and system for carrying out label identification on resultless word
CN105989125B (en) * 2015-02-16 2019-08-16 苏宁易购集团股份有限公司 The searching method and system of tag recognition are carried out to no result word
CN105659235A (en) * 2016-01-08 2016-06-08 马岩 A term searching method for network information and a system thereof
WO2017117806A1 (en) * 2016-01-08 2017-07-13 马岩 Term search method and system for web information
WO2017166132A1 (en) * 2016-03-30 2017-10-05 马岩 Network information pushing method and system
CN106250516A (en) * 2016-08-03 2016-12-21 王晓光 Synonym application process in big data search and system
WO2018023481A1 (en) * 2016-08-03 2018-02-08 王晓光 Method and system for applying synonym in big data search
CN106294784B (en) * 2016-08-12 2019-12-17 合一智能科技(深圳)有限公司 resource searching method and device
CN106294784A (en) * 2016-08-12 2017-01-04 合智能科技(深圳)有限公司 Resource search method and device
CN107729347A (en) * 2017-08-23 2018-02-23 北京百度网讯科技有限公司 Acquisition methods, device, equipment and the computer-readable recording medium of synonymous label
US10769372B2 (en) 2017-08-23 2020-09-08 Beijing Baidu Netcom Science And Technology Co., Ltd. Synonymy tag obtaining method and apparatus, device and computer readable storage medium
CN107729347B (en) * 2017-08-23 2021-06-11 北京百度网讯科技有限公司 Method, device and equipment for acquiring synonym label and computer readable storage medium
CN110196941A (en) * 2018-07-24 2019-09-03 腾讯科技(深圳)有限公司 A kind of information recommended method, device, server and storage medium
CN111666417A (en) * 2020-04-13 2020-09-15 百度在线网络技术(北京)有限公司 Method and device for generating synonyms, electronic equipment and readable storage medium
CN111666417B (en) * 2020-04-13 2023-06-23 百度在线网络技术(北京)有限公司 Method, device, electronic equipment and readable storage medium for generating synonyms
CN116344012A (en) * 2023-05-29 2023-06-27 北京梆梆安全科技有限公司 Diagnosis and treatment log medical management system of (a)
CN116344012B (en) * 2023-05-29 2023-08-18 北京梆梆安全科技有限公司 Medical management system based on diagnosis and treatment log

Also Published As

Publication number Publication date
CN102722499B (en) 2015-07-01

Similar Documents

Publication Publication Date Title
CN102722498B (en) Search engine and implementation method thereof
CN102722501B (en) Search engine and realization method thereof
CN102722499B (en) Search engine and implementation method thereof
CN102737021B (en) Search engine and realization method thereof
CN100530180C (en) Method and system for suggesting search engine keywords
Chirita et al. Personalized query expansion for the web
CN101911042B (en) The relevance ranking of the browser history of user
CN102609433B (en) Method and system for recommending query based on user log
CN101452453B (en) A kind of method of input method Web side navigation and a kind of input method system
CN102073725B (en) Method for searching structured data and search engine system for implementing same
CN100483408C (en) Method and apparatus for establishing link structure between multiple documents
US20090265338A1 (en) Contextual ranking of keywords using click data
CN100433007C (en) Method for providing research result
US20130013616A1 (en) Systems and Methods for Natural Language Searching of Structured Data
WO2016109102A1 (en) Use of statistical flow data for machine translations between different languages
CN102725759A (en) Semantic table of contents for search results
CN103186574A (en) Method and device for generating searching result
CN103870461A (en) Topic recommendation method, device and server
CN102200975A (en) Vertical search engine system and method using semantic analysis
CN102314456A (en) Web page move search method and system
CN103942268A (en) Method and device for combining search and application and application interface
Wolfram Bibliometrics, information retrieval and natural language processing: Natural synergies to support digital library research
CN111475725A (en) Method, apparatus, device, and computer-readable storage medium for searching for content
CN102063454A (en) Method and equipment combining search and application
Gasparetti et al. Exploiting web browsing activities for user needs identification

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant