CN102722499B - Search engine and implementation method thereof - Google Patents

Search engine and implementation method thereof Download PDF

Info

Publication number
CN102722499B
CN102722499B CN201110079699.1A CN201110079699A CN102722499B CN 102722499 B CN102722499 B CN 102722499B CN 201110079699 A CN201110079699 A CN 201110079699A CN 102722499 B CN102722499 B CN 102722499B
Authority
CN
China
Prior art keywords
synonym
result
query
original
search
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201110079699.1A
Other languages
Chinese (zh)
Other versions
CN102722499A (en
Inventor
呼大为
李彦宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201110079699.1A priority Critical patent/CN102722499B/en
Publication of CN102722499A publication Critical patent/CN102722499A/en
Application granted granted Critical
Publication of CN102722499B publication Critical patent/CN102722499B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a search engine and an implementation method thereof. The implementation method comprises the following steps: receiving an original inquiry of user's search; analyzing the original inquiry so as to obtain an original word and synonyms of the original word, which are stored in the original inquiry form, and then replacing the original word in the original inquiry form by the synonyms so as to obtain the synonym inquiry; obtaining an original inquiry result website set and a synonym inquiry result website set according to the original inquiry and the synonym inquiry; calculating the overlap ratio of the webpages respectively in the original inquiry result and the synonym inquiry result; and consolidating the result website sets of the original inquiry and the synonym inquiry according to a preset consolidating strategy and a consolidating strategy corresponding to the overlap ratio, so as to generate a search result list. The search engine judges the overlap rate of the original inquiry result and the synonym inquiry result to determine the probability of appearing figurative sense in the synonym inquiry result; and when the probability is larger, synonymous inquiry result is pressed so as to avoid that results not meeting the searching requirements appear in the forefront of the list of search results, and as a result, excellent user experience of the user is ensured.

Description

Search engine and its implementation
Technical field
The present invention relates to search engine technique, particularly relate to search engine and its implementation of a kind of easily extensible synonymic search query.
Background technology
The develop rapidly of internet is that people provide that a brand-new information stores, processing, the carrier that transmits and use, the network information also becomes rapidly people and obtains one of main channel of knowledge and information.And so how fully the information resources of scale, while nearly all knowledge of the mankind being occupied is included, bring the problem of development and utilization also to the user of resource.Search engine arises at the historic moment just under this demand, and its assisted network user searches information on the internet.Particularly, search engine according to certain strategy, use specific computer program to gather information from internet, after organizing information and processing, for user provides search service, by information display relevant for user search to user.
The on-line search service that search engine provides is normally based on the search of keyword, and namely user is by the input frame input inquiry expression formula of search engine, and search engine carries out inquiring about and returns the results web page comprising these keywords.Due to knowledge background or the use habit difference of different user, the keyword used same thing search may also can be different, add in natural language and inherently there are a lot of synonym or near synonym, so it is inadequate that the keyword only provided based on user is searched for.At present, a lot of search engine all has the function of expanding query, as synonym expanding query.After search engine receives the original query expression formula of user's input, participle operation can be carried out to it, and identify in the entry set after participle whether have potential synonym pair.Particularly, entry after cutting and predetermined synonym dictionary can mate by search engine, judge whether to exist in these entries synon, if, then can inquire about by expanded search on synon basis, and be shown to user by returning after the Query Result of expansion and the merging of original Query Result.Thus, for user provides the Search Results of more heterogeneous pass.
But same words may embody different implications in different semantic environments, so its synonym also just synonym or near justice in certain semantic environment, and change and do different semantic environments, this synonym just cannot be suitable for.So in this case, the possibility of result obtained with synonym expanding query is not just the result that user wants, and thus, can bring poor experience on the contrary to user.Such as, the original query of user's input is " how fish-flavoured shredded pork is cooked ".Subsequently, search engine is by the participle to original query, and the potential synonym obtaining " how doing " after mating with thesaurus is to { " how doing ", " menu " }, and substituted for " how doing " with " menu " and perform expansion synonym and inquire about and obtain corresponding Query Result.If but the original query that user provides is " how doing bedside cupboard ", obviously, user's demand now wants to understand the making of furniture, and search engine still uses " menu " to replace " how doing " to carry out expanding the words of synonym inquiry, just obtain user and undesired escape result, user like this can query to the accuracy of search.
In view of this, be necessary to be improved existing search engine, to solve the problem.
Summary of the invention
The object of the present invention is to provide a kind of search engine, it is by distinguishing that the escape probability of synonym expanding query result adjusts the sequence of synonym Query Result in whole Search Results, thus avoid occurring escape result in the prostatitis of Search Results, and then guarantee that user has good experience.
The present invention also aims to the implementation method that a kind of above-mentioned search engine is provided.
One of for achieving the above object, the implementation method of a kind of search engine of the present invention, it is characterized in that, the method comprises the steps:
Receive the original query of user search;
Analyze described original query, to obtain the synonym being present in former word in original query and this former word, and the former word substituted by described synonym in original query is to obtain synonym inquiry;
The set of original query results web page and the set of synonym queried result website is obtained according to described original query and synonym query search;
Calculate the registration of webpage in described original query result and synonym Query Result;
Merge according to the predetermined consolidation strategy corresponding with described registration the results web page set that original query and synonym inquire about, and generate search result list.
As a further improvement on the present invention, the calculating of described registration comprises the quantity calculating the webpage overlapped in original query result and synonym Query Result | U1 ∩ U2|.
As a further improvement on the present invention, the calculating of described registration also comprises and determines a Min less in the webpage quantity of original query result and the webpage quantity of synonym Query Result (| U1|, | U2|); Described registration I (U1, U2)=| U1 ∩ U2|/Min (| U1|, | U2|).
As a further improvement on the present invention, the calculating of described registration also comprises the summation calculating the webpage quantity of original query result and the webpage quantity of synonym Query Result | U1 ∪ U2|; Described registration I (U1, U2)=| U1 ∩ U2|/| U1 ∪ U2|.
As a further improvement on the present invention, described consolidation strategy comprises: when the value of described registration is less than predetermined threshold value in predetermined registration interval, described predetermined consolidation strategy for carrying out suppressing process when merging synonym Query Result, described in suppress process and comprise:
Reduce the degree of correlation weights of webpage in synonym Query Result; Or
After synonym Query Result being inserted into the specific page of search result list; Or
Synonym Query Result is adjusted to after original query result.
As a further improvement on the present invention, described consolidation strategy comprises: when the value of registration is greater than predetermined threshold value in predetermined registration interval, according to the original result of inquiring about with synonym of the degree of correlation weight number combining of each webpage in original query result and synonym Query Result.
For realizing another object above-mentioned, a kind of search engine of the present invention, it comprises search component, and search component comprises:
Query analysis module, for receiving the original query of user search; Analyze described original query, to obtain the synonym being present in former word in original query and this former word, and the former word substituted by described synonym in original query is to obtain synonym inquiry;
Search module, for obtaining the set of original query results web page and the set of synonym queried result website according to described original query and synonym query search;
Registration calculates and result merges module, for calculating the registration of webpage in described original query result and synonym Query Result; And merge according to the predetermined consolidation strategy corresponding with described registration the results web page set that original query and synonym inquire about, and generate search result list.
As a further improvement on the present invention, the calculating of described registration comprises the quantity calculating the webpage overlapped in original query result and synonym Query Result | U1 ∩ U2|.
As a further improvement on the present invention, the calculating of described registration also comprises and determines a Min less in the webpage quantity of original query result and the webpage quantity of synonym Query Result (| U1|, | U2|); Described registration I (U1, U2)=| U1 ∩ U2|/Min (| U1|, | U2|).
As a further improvement on the present invention, the calculating of described registration also comprises the summation calculating the webpage quantity of original query result and the webpage quantity of synonym Query Result | U1 ∪ U2|; Described registration I (U1, U2)=| U1 ∩ U2|/| U1 ∪ U2|.
As a further improvement on the present invention, described consolidation strategy comprises: when the value of described registration is less than predetermined threshold value in predetermined registration interval, described predetermined consolidation strategy for carrying out suppressing process when merging synonym Query Result, described in suppress process and comprise:
Reduce the degree of correlation weights of webpage in synonym Query Result; Or
After synonym Query Result being inserted into the specific page of search result list; Or
Synonym Query Result is adjusted to after original query result.
As a further improvement on the present invention, described consolidation strategy comprises: when the value of registration is greater than predetermined threshold value in predetermined registration interval, according to the original result of inquiring about with synonym of the degree of correlation weight number combining of each webpage in original query result and synonym Query Result.
Compared with prior art, the invention has the beneficial effects as follows: search engine is by judging the registration of original query result and synonym Query Result, determine that the probability of escape appears in synonym Query Result, and when escape probability is larger, suppress synonym Query Result, with the prostatitis avoiding the result not meeting user search demand to appear at search result list, thus guarantee that user has good experience.
Accompanying drawing explanation
Fig. 1 is the principle of work block diagram of the first embodiment of search engine of the present invention;
Fig. 2 is the workflow diagram that the search engine shown in Fig. 1 excavates synonym linguistic context;
Fig. 3 is the workflow diagram that the search engine shown in Fig. 1 performs synonym expanding query;
Fig. 4 is the principle of work block diagram of the second embodiment of search engine of the present invention;
Fig. 5 is the workflow diagram that the search engine shown in Fig. 4 performs synonym expanding query;
Fig. 6 is the principle of work block diagram of the 3rd embodiment of search engine of the present invention;
Fig. 7 is the workflow diagram that the search engine shown in Fig. 6 performs synonym expanding query;
Fig. 8 is the principle of work block diagram of the 4th embodiment of search engine of the present invention;
Fig. 9 is the workflow diagram that the search engine shown in Fig. 8 performs synonym expanding query;
Figure 10 is that the search engine shown in Fig. 8 judges synonym similarity grade, and carries out the workflow diagram in an embodiment of corresponding mark to synonym.
Embodiment
Describe the present invention below with reference to each embodiment shown in the drawings.But these embodiments do not limit the present invention, the structure that those of ordinary skill in the art makes according to these embodiments, method or conversion functionally are all included in protection scope of the present invention.
Shown in Fig. 1 is the principle of work block diagram of the first embodiment of search engine 100 of the present invention.In present embodiment, search engine 100 collects webpage according to certain strategy from internet, after organizing webpage and processing, the browser 21 of customer in response end 20 can ask and provide the service of search inquiry.Wherein, search engine 100 can comprise one or more store and management data respond the network server entity of searching request of being used for.Client 20 can comprise one or more subscriber terminal equipment, as personal computer, notebook computer, wireless telephone, personal digital assistant (PDA) or other computer installation and communicator.
These servers and terminal device architecturally all comprise some basic modules, as bus, treating apparatus, memory storage, one or more input/output device and communication interface etc.Bus can comprise one or more wire, is used for realizing the communication between server or each assembly of terminal device.Treating apparatus comprises and all types of is used for performing instruction, the processor for the treatment of progress or thread or microprocessor.Memory storage can comprise the dynamic storagies such as the random access storage device (RAM) storing multidate information, with the static memory such as ROM (read-only memory) (ROM) storing static information, and comprise the mass storage of magnetic or optical record medium and respective drive.Input media supplies user's input information to server or terminal device, as keyboard, mouse, writing pencil, voice recognition device or biometric apparatus etc.Output unit comprises display, printer, loudspeaker etc. for output information.Communication interface is used for server or terminal device are communicated with other system or device.Be connected in network by wired connection, wireless connections or light between communication interface, making between search engine 100, client 20 can by the mutual communication of real-time performance.Network can comprise LAN (Local Area Network) (LAN), wide area network (WAN), telephone network as the combination etc. of the internet of public switch telephone network (PSTN), enterprises, the Internet or these networks above-mentioned.Server and terminal device all include for management of system resource, control the operating system software that other program runs, and be used for realizing application software or the programmed instruction of certain functional modules function.
As shown in Figure 1, search engine 100 can perform synonym expanding query, and it can be divided into off-line part and online part on the whole.In off-line part, search engine 100 comprise can storage network page data and synonym to user's inquiry log database 16 of the data repository 12 of information, index 13, webpage capture device 14, recording user Query Information and the log analyzer 17 analyzed user's inquiry log.
Webpage capture device 14 is the programs being captured webpage according to certain strategy by the hyperlink relation between webpage one by one.In a particular embodiment, webpage capture device 14 is from initial URL (Universal Resource Locator, URL(uniform resource locator)) choose URL to be crawled according to certain scheduling strategy in storehouse, resolve the network server address indicated in URL, then connect, send request and receive data, the web data of acquisition to be stored in the web page library 122 of data repository 12 and to set up local collection of document, then from wherein extracting link to carry out next step grasping movement, so move in circles until all URL have captured.The scheduling strategy that URL institute foundation chosen by webpage capture device 14 can comprise breadth-first strategy, depth-first strategy, backward chaining number strategy etc.; Grasp Modes can be that accumulating captures, and also can be that increment type captures.Index 13 is for analyzing local collection of document and setting up index.Such as from the full text of document, extract entry by participle, then cross and filter high frequency words or low-frequency word, to obtain index terms set, finally webpage is converted into the mapping of index terms to webpage to the mapping of index terms, forms the inverted file that comprises index thesaurus and inverted list and be stored in the index database 121 of data repository 12.The method of web document being carried out to participle comprises based on the segmenting method of dictionary, the segmenting method based on the segmenting method understood and Corpus--based Method.The wherein more common segmenting method based on dictionary comprises again the maximum syncopation of forward, reverse maximum syncopation and minimum syncopation.
In the present invention, synonym refers to title difference but the equivalent in meaning or close entry of expressing, and namely identical the or close meaning expressed in multiple entry, then these entries synonym each other.In present embodiment, thesaurus 123 comprises the corresponding table 1231 of synonym and synonym context bank 1232.Wherein specify different words and its synon corresponding relation in advance, as the former word by adding up acquisition in advance and its synon mapping table in the corresponding table 1231 of synonym.This correspondence table can also come constantly to upgrade by the historical query click data analyzing user.Such as, in the title of clicked queried result website, comprises the synonym of certain former word but do not occur former word, and the frequency that this situation occurs is higher, then this former word and synonym are defined as synonym to and be added in the corresponding table 1231 of synonym.
Shown in Fig. 2 is the workflow that search engine 100 excavates an embodiment of the right synonym linguistic context of synonym.In the present invention, synonym linguistic context refers to the semantic environment that synonym occurs Central Plains word, and it is used for showing that this synonym is to be suitable for semantic environment, and namely under this semantic environment, synonym is applicable to replacing former word and carries out synonym expanding query.In the present embodiment, synonym linguistic context obtains by analyzing user's inquiry log.User's inquiry log database 17 is after each search terminates, and is used for the inquiry click data of recording user, as query word expression formula, search time, the results list returned and clicked results web page etc.Also coordinate with reference to shown in Fig. 1 with reference to Fig. 2, log analyzer 17 analyzes user's query formulation and the click data (step 411) of the history comprised in user's inquiry log database 16, comprise analysis of history query formulation and return in response to ad hoc inquiry formula and the queried result website of clicked access.Next, log analyzer 17 can identify in these data whether there is the right synonym linguistic context of certain synonym, if so, then records and is stored in synonym context bank 1232.
Particularly, first log analyzer 17 can judge whether comprise former word in a certain historical query formula based on the corresponding table 1231 of synonym, if so, then obtains and comprises this former word and corresponding synon synonym pair.Such as, historical query formula is " how fish-flavoured shredded pork is cooked ", log analyzer 17 judges to exist in this query formulation the former word of " how doing " (by " how fish-flavoured shredded pork is cooked " cutting as " fish-flavoured shredded pork " and " how doing " two entries based on the corresponding table 1231 of synonym, then by this two entries are corresponding with synonym show in former word mate, thus find the former word of " how doing "), and obtain corresponding synonym to { " how doing ", " menu " }.Subsequently, log analyzer 17 judges for this query formulation, and user clicks in the web page title of access and whether contains synonym but do not comprise former word, if so, then records the synonym linguistic context that this synonym is right.Such as, for query formulation " how fish-flavoured shredded pork is cooked ", user clicked the webpage that title is " fish-flavoured shredded pork menu ", then log analyzer 17 will perform the operation of record synonym linguistic context.Synonym linguistic context at least comprises this historical query formula, as " how fish-flavoured shredded pork is cooked "; Also next-door neighbour's word of this historical query formula Central Plains word can be comprised, as " fish-flavoured shredded pork "; Or both record synonymously to the synonym linguistic context of { " how doing ", " menu " }.Wherein, before next-door neighbour's word can be positioned at former word, after also can being positioned at former word; Next-door neighbour's word also can be empty word bar, namely only comprises former word in original query, there is not next-door neighbour's word.
In above-mentioned embodiment, synonym linguistic context is obtained by the user behavior of history, but in other embodiments, synonym linguistic context also can be determined according to the Anchor Text of webpage.The text message comprised in the hyperlink of Anchor Text and webpage.Such as, the place that webpage www.sina.com.cn is cited surpasses chain text " Sina website's homepage ", " Sina's homepage ", " sina homepage ", so these word sections can go on record synonymously to the synonym linguistic context of { " Sina website ", " Sina " }.In addition, synonym linguistic context also can be determined according to the section arranged side by side in web page title.Such as, price.mycar168.com/search.asp? the title of this network address of factoryid=135 is " quotation of Huachen BMW, automobile big world, Huachen BMW price Shenzhen net ".Then pass through separator, this title can be split as multiple entry fragment arranged side by side " quotation of Huachen BMW " " Huachen BMW price " " automobile big world, Shenzhen net ", and the first two fragment comprises synonym to { " price ", " quotation " } in " price " and " quotation ", so these two fragments also can as the right synonym linguistic context of this synonym.
With reference to shown in Fig. 2, in the process that synonym linguistic context is excavated, the click behavior of user might not be all completely reasonably, that is, user may not be in the mood for clicking some incoherent results in the process of navigate search results, and the synonym linguistic context recorded in this case would not be accurate.Think and eliminate the negative effect that causes of this situation, log analyzer 17 also can add up the frequency that synonym linguistic context is recorded, and, only have when the frequency is more than or equal to a predetermined frequency threshold value, this synonym linguistic context just can retain and is defined as the right synonym linguistic context of corresponding synonym, that is, the synonym linguistic context (step 413) of filter out low frequency.
As shown in Figure 1, the online part of search engine 100 mainly comprises search component 11 and user interface 15.Wherein user interface 15 is represented by the browser software 21 of client 20, for supplying user input query formula, and by the list of predetermined ways of presentation display of search results; In addition, after search terminates, also for the Query Information of recording user, and by it stored in user's inquiry log database 16.Search Results, for the searching request of customer in response end 30, is returned to client 20 by search component 11.In present embodiment, search component 11 comprises search module 111, query analysis module 112 and result synthesis module 113.For common original query (not comprising expanding query), query analysis module 112 is generally used for carrying out participle operation to the original query be currently received, and obtains query word set, and generated query vocabulary.Search module 111, after receiving inquiry vocabulary, mates with the index thesaurus in data directory storehouse 121, finds corresponding index terms and inverted list corresponding to each index terms, thus obtains the web document set relevant to query word.The web document searched order arranges according to the degree of correlation weights between predetermined each document and query word by result synthesis module 113, then the results list is returned to client by user interface 15.
Illustrate that search engine 100 performs the detailed step of synonym expanding query online according to synonym linguistic context below in conjunction with the workflow shown in Fig. 3.Query analysis module 112 receives the original query (step 421) of active user's search by user interface 15, and then analysis and consult formula (step 422), comprises and carry out participle operation to original query.It should be noted that, the segmenting method in present embodiment is the maximum syncopation of forward based on dictionary, and the entry fragment structure that this dictionary is comprised by synonym linguistic context forms.Before address, historical query formula can by as synonym context record, and the fragment length of historical query formula be greater than this query formulation be split after the length of entry, so, forward maximum syncopation is adopted to guarantee once comprise the fragment of historical query formula in current original query, then this fragment can be taken the lead in cutting out, thus improves the accuracy rate of follow-up calculating.Such as, in synonym linguistic context excavation phase, " today Nokia how much " historical query formula be, then recording synonym to { " how much ", " price " } synonym linguistic context time, historical query formula " today Nokia how much " and next-door neighbour's word " Nokia " can record as synonym linguistic context.And " who know today Nokia how much " current original query be, according to the maximum syncopation of forward, fragment the longest in synonym context lexicon " today Nokia how much " length is 8, then query analysis module 112 from left to right scans current original query, judge length be 8 phrase whether appear in synonym context lexicon, when finding " today Nokia how much " coupling, it first will be cut out, so, " Nokia " would not cut out as independent keyword.In step 422, the entry set obtained after original query cutting also can be mated with thesaurus 123 by query analysis module 112, obtain potential synonym to the synonym linguistic context right with this synonym, this potential synonym centering contains the former word being present in and comprising in original query, and the synonym corresponding with this former word.
Next, query analysis module 112 judges whether synonym linguistic context and original query mate (step 423).In the present embodiment, query analysis module 112 can calculate the matching degree of synonym border and original query, when the value of matching degree is in predetermined matching degree interval, then determine synonym linguistic context and original query coupling, namely show that the semantic environment of current original query is applicable to adopting synonym to replace former word to perform expanding query.The calculating of matching degree can remove the length after former word according to former word beginning query formulation, and the length of synonym linguistic context is determined.Below in present embodiment, when the length of original query is greater than length (i.e. q ≠ the orig) of former word, the computing formula of matching degree M:
M ( orig , syn ) = Σ i = 1 n TermCount ( p i ) TermCount ( q ) - TermCount ( orig ) , q ≠ orig
Wherein TermCount (q) represents the length of original query, and TermCount (orig) represents the length of original query Central Plains word, and TermCount (pi) represents the length of i-th synonym linguistic context.Because in this case, can there is the word of non-synonym linguistic context in original query, therefore M is in the value between [0,1].Preset a matching degree threshold value θ, then when the value of M is in [θ, 1] time, show synonym linguistic context and original query coupling, the former word then replaced by synonym in original query is inquired about to obtain synonym, search module 111 obtains the collections of web pages of original query result and the set (step 424) of synonym queried result website according to original query and synonym query search subsequently, and result synthesis module 113 merges the result (step 425) of original query and synonym inquiry according to predetermined consolidation strategy.About result consolidation strategy, will be described in detail below.When the value of M is in [0, θ] time, show that synonym linguistic context and original query are not mated, namely under this semantic environment, be not suitable for substituting former word with synonym, 111 meetings of following search module perform according to original query and search for and obtain the collections of web pages (step 426) of original query result, and then result synthesis module 113 obtains search result list (step 425) according to the degree of correlation weights between predetermined each webpage and original query.When original query only comprises former word (i.e. q=orig), matching degree M=1, then replace original query with between synonym, then performs step 424 and step 425.
Search engine is by the semantic environment analysis to active user's query demand, to determine whether that the conversion of applicable synonym performs synonym expanding query, thus guarantee the accuracy rate of synonym expanding query, make expanding query meet the demand of user as far as possible, and then guarantee that user has good experience.
Fig. 4 and Fig. 5 discloses the second embodiment of search engine of the present invention.Compare the first embodiment, the search engine 200 of present embodiment, mainly through judging the escape probability of synonym Query Result, adjusts synonym Query Result and is in the end presented to position in the search result list of user.As shown in Figure 4, search engine 200 comprises search component 11, data repository 12, index 13, grabber 14 and user interface 15.Data repository 12, index 13, grabber 14 are substantially identical with above-mentioned embodiment with functional modules such as user interfaces 15, so applicant is no longer repeated at this.In present embodiment, search component 11 comprises search module 111, query analysis module 112 and registration and calculates and result merging module 114.
Perform synonym expanding query below in conjunction with Fig. 5 to the search engine of present embodiment to elaborate.First, query analysis module 112 receives the original query (step 431) of user.Next, analysis and consult formula (step 432), comprise and participle operation is carried out to obtain query word set to original query, identify former word in original query based on thesaurus 123 and obtain and comprise former word and its synon synonym pair, and directly synonym is replaced former word and inquire about to obtain synonym.Search module 111 obtains the collections of web pages of original query result and the set (step 433) of synonym queried result website according to original query and synonym query search.Next, registration calculating and result merge the registration (step 434) that module 114 calculates webpage in original query result and synonym Query Result.This registration is mainly used for the quantity of the middle same web page of reacting original query result and synonym Query Result, if the quantity of same web page is abundant, show synonym Query Result and original query results contrast close, then synonym Query Result occurs that the probability of escape is less; Otherwise, then show that synonym Query Result occurs that the probability of escape is comparatively large, need to suppress to avoid the result not meeting user search demand to appear at the prostatitis of the results list to synonym Query Result.
The calculating of registration can adopt various ways, as only calculated the quantity of the webpage overlapped in original query result and synonym Query Result | and U1 ∩ U2|, namely determines identical URL quantity; Or calculate the coincidence webpage quantity of each front 100 results in two results sets, then compare judgement with predetermined threshold value.As preferred mode, the calculating of registration also comprises determines a Min less in the webpage quantity of original query result and the webpage quantity of synonym Query Result (| U1|, | U2|); Then registration I (U1, U2)=| U1 ∩ U2|/Min (| U1|, | U2|).Or in other embodiments, the calculating of registration also comprises the summation calculating the webpage quantity of original query result and the webpage quantity of synonym Query Result | U1 ∪ U2|; Then registration I (U1, U2)=| U1 ∩ U2|/| U1 ∪ U2|.After the value of registration is calculated, can judge whether this value is in predetermined registration interval to determine whether to need to suppress synonym Query Result, then determine the position of synonym Query Result in search result list and export the result (step 435) after merging.With registration account form I (U1, U2)=| U1 ∩ U2|/Min (| U1|, | U2|) is example, and the value I of registration is for being in the floating number between [0,1].Preset a registration threshold value σ, then when I is in [σ, 1] time, show that the registration of original query result and synonym Query Result is higher, in this case, do not need to suppress synonym Query Result, only need merge result that is original and synonym inquiry according to the degree of correlation weights of predetermined each webpage.When I is in [0, σ], show that the registration of original query result and synonym Query Result is lower, the escape probability of synonym Query Result is comparatively large, at this moment just needs to suppress synonym Query Result.The mode suppressed can be do the degree of correlation weights of webpage in synonym Query Result to fall power process, thus makes to be in the position comparatively in synonym Query Result search result list after merging; Or after synonym Query Result being inserted into the specific page of search result list, as synonym Query Result adjusted to the second page of search result list; In addition, also synonym Query Result can be adjusted to after original query result, namely synonym Query Result appears at search result list backmost.
Search engine is by judging the registration of original query result and synonym Query Result, determine that the probability of escape appears in synonym Query Result, and when escape probability is larger, suppress synonym Query Result, with the prostatitis avoiding the result not meeting user search demand to appear at search result list, thus guarantee that user has good experience.In present embodiment, before replacing former word with synonym and performing synonym expanding query, what embodiment one must do not adopted to mention judges to determine whether to carry out synonym replacement by synonym linguistic context, but, those of ordinary skill in the art can easily it is envisioned that, if present embodiment is in conjunction with the first embodiment, namely before synonym is replaced, first perform the judgement of synonym linguistic context, then synonym Query Result out after merge Search Results according to the registration of original and synonym Query Result, obviously Search Results more accurately can be obtained like this, thus promote Consumer's Experience further.
Fig. 6 and Fig. 7 discloses the 3rd embodiment of search engine of the present invention.Present embodiment is based on synonym Query Result, judge the escape probability of synonym Query Result further by the semantic topic distribution analyzing synonym queried result website, and then adjustment synonym Query Result is in the end presented to the position in the search result list of user.As shown in Figure 6, similar with the first embodiment, search engine 300 comprises search component 11, data repository 12, index 13, grabber 14, user interface 15, user's inquiry log database 16, log analyzer 17.Wherein the functional module such as index 13, grabber 14, user interface 15, user's inquiry log database 16, log analyzer 17 is identical with the first embodiment, and applicant is no longer repeated at this.In present embodiment, search component 11 comprises search module 111, query analysis module 112, result synthesis module 113 and escape determination module 115.Data repository 12 includes index database 121, web page library 122, thesaurus 123 and web page semantics theme storehouse 124.Wherein index database 121, web page library 122, thesaurus 123 are identical with the first embodiment, and applicant is no longer repeated at this.Search engine 300 also comprises a subject analysis module 18, and in present embodiment, this subject analysis module 18 comprises a probability latent semantic analysis (Probabilitistic Latent SemanticAnalysis, calls PLSA in the following text) model.
PLSA model is a kind of instrument of natural language processing, and it is mainly used in the potential applications of analytical documentation.A document can be represented as the set of one group of word, but due to synon existence, and word is not the most basic composition element of document, so, can think between word and document, to also have a potential semantic level, i.e. theme.Such as, the query formulation of user's input is " the green color of Swiss Army Knife ", due to { " green color ", " green " } be synonym pair, so can perform synonym expanding query with " green " replacement " green color ", but the possibility of result of at this moment recalling can comprise the webpage of title for " system Swiss Army Knife-perfection unloading V2007 green edition ".This is because " the green color of Swiss Army Knife " corresponding theming as " article ", and " system Swiss Army Knife-perfection unloading V2007 green edition " corresponding theme as " software ", obviously, search engine also cannot understand these themes implied.PLSA model is the topic model that potential applications theme is analyzed in a kind of distribution by calculating co-occurrence word in document, and it introduces a potential semantic layer between document and word, and this potential applications layer is made up of n potential applications theme.Suppose it is separate between document and word, then the probability that document and word occur jointly is decided by the probabilistic relation between they and theme.Therefore, document or the relation between word and potential applications theme can be calculated by PLSA model.Based on this, the semantic topic distribution of synonym linguistic context and synonym queried result website can be obtained in present embodiment by PLSA model, and the matching degree both calculating is to determine the escape probability of synonym Query Result.Next, will to be described in detail.
As shown in Figure 6, subject analysis module 18 obtains webpage from web page library 122, removes the noise words such as the frame advertisement in webpage, then extracts the keyword set that can represent this webpage.Subsequently, the webpage that the semantic topic that subject analysis module 18 calculates this webpage of acquisition expression by PLSA model distributes-potential applications theme vector S2={s21, s22 ..., s2n}, wherein s2n represents the probability score of this webpage on the n-th semantic topic.In present embodiment, the acquisition of web page semantics theme distribution obtains under off-line state, i.e. subject analysis module 18 analyzes all crawled webpages, obtains the distribution of its semantic topic, be then stored in web page semantics theme storehouse 124.Certainly, this process also can be obtain under the state of on-line search, and namely after synonym Query Result obtains, the webpage of subject analysis module 18 only in analysis and consult result, then gives escape determination module 115 by the distribution of the semantic topic of these webpages and judge.In present embodiment, the acquisition of synonym linguistic context semantic topic distribution is canbe used on line.When query analysis module 112 cutting original query obtains after keyword set, subject analysis module 18 obtains this keyword set, and from synonym context bank 1232, obtain the entry set that corresponding synonym linguistic context comprises.Then, the entry set of keyword set and synonym linguistic context is combined, give PLSA model to calculate and the synonym linguistic context-potential applications theme vector S1={s11 obtaining the semantic topic distribution representing this synonym linguistic context, s12, ... s1n}, wherein s1n refers to the probable value of synonym linguistic context on the n-th semantic topic.After obtaining vectorial S1, subject analysis module 18 is handed over to escape determination module 115 to judge the similarity of S1 and S2.About the step judged, will describe in detail later.
Next cooperation Fig. 7 is introduced search engine 300 in present embodiment in detail and perform the detailed step of synonym expanding query.First, query analysis module 112 receives the original query (step 441) of user search, then analyzes (step 442) this original query.Query analysis module 112 can carry out participle operation to original query, and as the first embodiment, participle operation is that the dictionary built based on synonym linguistic context does maximum forward cutting.After participle operation, obtain primary keys set, on the one hand, primary keys set intersection is performed original query (step 449) to search module 111 by query analysis module 112, and obtains original query result (step 450).On the other hand, query analysis module 112 identifies the former word comprised in original query based on thesaurus 123, and obtain corresponding potential synonym to and the right synonym linguistic context of this potential synonym.Analysis and consult module 112, after the above-mentioned data of acquisition, can be replaced former word and inquires about to obtain synonym by direct synonym, and give search module 111 and perform synonym expanding query (step 443).In preferred embodiment, before execution synonym replacement operation, first can judge whether the synonym linguistic context meeting former word, if met, then perform the operation of synonym replacement, so can further improve the accuracy rate of synonym Query Result.About the operation judging to perform synonym according to the matching degree of synonym linguistic context and replace, be described in detail in the first embodiment, applicant this no longer semanteme repeat.In addition, query analysis module 112 also by primary keys set intersection to subject analysis module 18, to be calculated by PLSA model by it and the semantic topic obtaining synonym linguistic context distributes (step 447), the result of calculating gives escape determination module 115.
After search module 111 performs synonym inquiry acquisition synonym Query Result (step 444), escape determination module 115 obtains the semantic topic distribution of results web page according to synonym Query Result from web page semantics theme storehouse, the vectorial S2={s21 of i.e. webpage-potential applications theme, s22, ..., s2n} (step 445).On the other hand, escape determination module 115 obtains the semantic topic distribution of synonym linguistic context from subject analysis module, the i.e. vectorial S1={s11 of synonym linguistic context-potential applications theme, s12, ... s1n}, next, escape determination module 115 judges the matching degree of two semantic topic distributions, namely calculates the similarity (step 446) of two vectorial S1, S2; Then filter synonym Query Result (step 448) according to matching degree, that namely determines synonym Query Result suppresses mode, and merges the result of original query and synonym inquiry accordingly, generates search result list (step 451).Have multiple, as inner product similarity, cosine similarity etc. about two vectorial Similarity Measure.It is below the example of the computing formula utilizing similarity between cosine similarity compute vector S1 and S2.
sim ( S 1 , S 2 ) = Σ i = 1 n s 1 i * s 2 i Σ j = 1 n s 1 i 2 Σ j = 1 n s 2 i 2
If the value of the similarity calculated is very high, show that this webpage and synonym linguistic context probability on the n-th semantic topic is all very large, then can judge that two semantic topic distribution matching degrees are high, namely the escape probability of this webpage is less; Otherwise, if the value of the similarity calculated is very low, shows that the escape probability of this webpage is comparatively large, so with regard to needs, this result suppressed.Particularly, the value sim (S1, S2) of similarity is for being in the floating number between [0,1].A threshold alpha can be preset, then as sim (S1, S2) [α is in, 1] time, show that the matching degree of two semantic topic distributions is higher, in this case, do not need to suppress synonym Query Result, only need merge result that is original and synonym inquiry according to the degree of correlation weights of predetermined webpage.When sim (S1, S2) is in [0, α], show that the matching degree of two semantic topic distributions is lower, the escape probability of synonym Query Result is comparatively large, at this moment just needs to suppress synonym Query Result.The mode suppressed can be do the degree of correlation weights of synonym queried result website to fall power process, thus makes to be in the position comparatively in synonym Query Result search result list after merging; Or after synonym Query Result being inserted into the specific page of search result list, as synonym Query Result adjusted to the second page of search result list; In addition also synonym Query Result can be adjusted to after original query result, namely synonym Query Result appears at search result list backmost.
Search engine is by comparing the matching degree of the semantic topic distribution of synonym linguistic context and synonym queried result website, can judge whether synonym Query Result meets the potential demand of user, thus correspondingly can control the sequence of synonym Query Result in whole search result list accordingly, to avoid occurring escape result in the prostatitis of Search Results, and then guarantee that user has good experience.Except the PLSA model introduced in above-mentioned embodiment, other topic model also can be used for analyzing the potential semantic topic of synonym linguistic context and synonym queried result website, as latent semantic analysis (Latent Semantic Analysis, LSA) model or potential Di Li Cray distribute (Latent Dirichlet Allocation, LDA) model etc.
Fig. 8 to Figure 10 discloses the 4th embodiment of search engine of the present invention.Present embodiment mainly describes the synon ways of presentation in Search Results.The principle of work block diagram of search engine 400 as shown in Figure 8, it comprises search component 11, data repository 12, index 13, grabber 14 and user interface 15.Data repository 12, index 13, grabber 14 are substantially identical with above-mentioned embodiment with functional modules such as user interfaces 15, so applicant is no longer repeated at this.In present embodiment, search component 11 comprise search module 111, query analysis module 112, result synthesis module 113, for analyzing the analysis module 116 of synonym and former Word similarity grade and determining the labeling module 117 of synonym ways of presentation.
Perform synonym expanding query below in conjunction with Fig. 9 to the search engine of present embodiment to elaborate.First, query analysis module 112 receives the original query (step 461) of user search, then analyzes (step 462) this original query.Query analysis module 112 can carry out participle operation to original query, to obtain primary keys set.Query analysis module 112 identifies the former word comprised in original query based on thesaurus 123, and acquisition comprises this former word and synon synonym pair thereof.On the one hand, analysis and consult module 112 synonym replaces former word to obtain synonym inquiry, and search module 111 is according to original query and synonym query execution original query and synonym expanding query (step 463) subsequently.Search module 111, after acquisition original query result and synonym Query Result, is transferred to result synthesis module 113 to merge and generates search result list (step 464).About merging method that is original and synonym inquiry, describe in detail in above-mentioned embodiment, applicant is no longer repeated at this.On the other hand, synonym to giving similarity grade analysis module 116, being judged the similarity grade (step 465) of synonym and former word, and giving labeling module 117 by judged result by query analysis module 112 by it.Next, labeling module 117 determines synon exhibition method according to the judged result of similarity grade, and finally by user interface 15, the search result list marked is presented to user's (step 466).
Below in conjunction with Figure 10 the similarity grade of synonym and former word to be judged and correspondingly exhibition method illustrates further.Similarity grade analysis module 116 obtains synonym to (step 471) from query analysis module 112, first judges whether the synonym of synonym centering and former word belong to high similarity grade (the first estate that namely similarity higher grade) (step 472).In present embodiment, the situation that synonym and former word belong to high similarity grade comprises proper noun abbreviation (as " Peking University " and " Beijing University ", " Sina website " and " sina ") or digital conversion (as " the 5th collection " and " the 5th collection ") or region word conversion (as " Beijing " and " Beijing ") etc.If belong to high similarity grade, then synonym is carried out to the mark (step 473) of particular color, this particular color is more eye-catching color usually, as the redness in present embodiment; If do not belonged to, then next judge whether synonym centering synonym and former word belong to middle similarity grade (i.e. junior second grade of similarity) (step 474).In present embodiment, in former word and synonym, the judgement of similarity grade comprises the judgement of semantic similarity or morphology similarity.
Below the concrete example of Semantic Similarity Measurement formula:
SSim ( orig , syn ) = ClickQueryCount ( orig , syn ) QueryCount ( orig ) ,
Wherein ClickQueryCount (orig, syn) represents in query formulation and comprises former word orig, clicks in the title of the webpage of access simultaneously and does not comprise former word orig but the historical query quantity comprising synonym syn; QueryCount (orig) represents in query formulation the historical query quantity comprising former word orig.Such as, the historical query formula of user's input is " Beijing University where ", then the title clicked in Search Results is the webpage of " Peking University where ", so current inquiry will be accumulated on ClickQueryCount (orig, syn) and QueryCount (orig); And if user is the webpage of " Beijing University where " for historical query formula " Beijing University where " title just clicked in Search Results, then current inquiry only can be accumulated on QueryCount (orig).Obviously, the value of semantic similarity is for being in the floating number between [0,1].A threshold value beta can be preset, then, when the value of semantic similarity is in [β, 1], show that former word and synonym belong to middle similarity grade; And when the value of semantic similarity is in [0, β], then next also will carry out the judgement of morphology similarity.If determined that this synonym is to belonging to middle similarity grade, then carry out the mark (step 475) of specific font to synonym, as runic or italic, be runic in present embodiment.
Below the concrete example of morphology calculating formula of similarity:
WSim ( orig , syn ) = CoocAlphaCount ( orig , syn ) AllAlphaCount ( orig , syn )
Wherein CoocAlphaCount (orig, syn) represents that former word orig and synonym syn have how many words to be the same, and AllAlphaCount (orig, syn) represents the sum comprising different word in former word orig and synonym syn.Such as: for synonym to { " how ", " how " }, CoocAlphaCount (" how ", " how ")=2, why " " and " " these two words appear in former word and synonym simultaneously because synonym centering; Why AllAlphaCount (orig, syn)=3, because synonym centering one has 3 different words " " " " " sample ".For English, then add up the quantity of letter, such as: for synonym to { " man ", " men " }, CoocAlphaCount (" man ", " men ")=2, and AllAlphaCount (" man ", " men ")=4.Obviously, the value of morphology similarity is also be in the floating number between [0,1].Can preset a threshold gamma, when the value of semantic similarity is in [γ, 1], show that former word and synonym belong to middle similarity grade, then labeling module 117 pairs of synonyms carry out mark slightly; And when the value of semantic similarity is in [0, γ] time, show that this synonym centering synonym and former word belong to low similarity grade (tertiary gradient that namely similarity grade is lower than the second grade), thus synonym does not carry out any mark (step 476).Relative to the mark of particular color, the boldness of specific font is weaker, but still can cause the concern of user, so be applicable to the synonym of middle similarity grade, although because its semanteme or morphology there occurs change, and former word still relatively; And the synonym of low similarity grade due to semantic or morphology and former word gap larger, if mark can bring lofty sense to user; So preferably do not mark.
Search engine, by discriminating synonyms and the similarity grade of former word, to the mark that the synonym in Search Results adapts, thus is avoided bringing lofty sense to user, and then is promoted the experience of user while for user's quick position information needed.
Those skilled in the art can easily it is contemplated that the mode of the judgment mode of synonym similarity grade, synonym displaying and different similarity grade be not limited in described in above-mentioned embodiment from the corresponding relation of different exhibition method.Such as, similarity grade can also be judged by editing distance, or highlighted notation methods is carried out to synonym.In addition, similarity grade can arrange more, as semantic similarity and morphology similarity being split as two different grades.Certainly, also similarity grade be can reduce, high similarity grade or low similarity grade only classified as by all synonyms.As belonged to proper noun abbreviation, digital conversion or the conversion of region word when synonym and former word; Or when former word and synon semantic similarity, morphology similarity or editing distance are more than or equal to appointment threshold value, can think high similarity grade, all the other are then low similarity grade.
Be to be understood that, although this instructions is described according to embodiment, but not each embodiment only comprises an independently technical scheme, this narrating mode of instructions is only for clarity sake, those skilled in the art should by instructions integrally, technical scheme in each embodiment also through appropriately combined, can form other embodiments that it will be appreciated by those skilled in the art that.
A series of detailed description listed is above only illustrating for feasibility embodiment of the present invention; they are also not used to limit the scope of the invention, all do not depart from the skill of the present invention equivalent implementations done of spirit or change all should be included within protection scope of the present invention.

Claims (12)

1. an implementation method for search engine, is characterized in that, the method comprises the steps:
Receive the original query of user search;
Analyze described original query, to obtain the synonym of the former word that is present in original query and this former word and described former word and described synon synonym linguistic context, and according to described original query and described synonym linguistic context, the former word substituted by described synonym in original query is inquired about to obtain synonym;
The set of original query results web page and the set of synonym queried result website is obtained according to described original query and synonym query search;
Calculate the registration of webpage in described original query result and synonym Query Result;
Merge according to the predetermined consolidation strategy corresponding with described registration the results web page set that original query and synonym inquire about, and generate search result list; Wherein,
Described synonym linguistic context adopts following steps to excavate:
Obtain the user of history and inquire about click data, described data comprise history inquiry and return in response to this inquiry and the queried result website of clicked access;
Identify synonym pair, described synonym is to comprising the former word be present in described historical query and the corresponding synonym be present in described queried result website;
The right synonym linguistic context of described synonym is defined as to historical query record described in major general.
2. the implementation method of search engine according to claim 1, is characterized in that, the calculating of described registration comprises the quantity calculating the webpage overlapped in original query result U1 and synonym Query Result U2 | U1 ∩ U2|.
3. the implementation method of search engine according to claim 2, it is characterized in that, the calculating of described registration also comprises determines a Min less in the webpage quantity of original query result and the webpage quantity of synonym Query Result (| U1|, | U2|); Described registration I (U1, U2)=| U1 ∩ U2|/Min (| U1|, | U2|).
4. the implementation method of search engine according to claim 2, is characterized in that, the calculating of described registration also comprises the summation calculating the webpage quantity of original query result and the webpage quantity of synonym Query Result | U1 ∪ U2|; Described registration I (U1, U2)=| U1 ∩ U2|/| U1 ∪ U2|.
5. the implementation method of search engine according to claim 1, it is characterized in that, described consolidation strategy comprises: when the value of described registration is less than predetermined threshold value in predetermined registration interval, described predetermined consolidation strategy for carrying out suppressing process when merging synonym Query Result, described in suppress process and comprise:
Reduce the degree of correlation weights of webpage in synonym Query Result; Or
After synonym Query Result being inserted into the specific page of search result list; Or
Synonym Query Result is adjusted to after original query result.
6. the implementation method of search engine according to claim 1, it is characterized in that, described consolidation strategy comprises: when the value of registration is greater than predetermined threshold value in predetermined registration interval, according to the original result of inquiring about with synonym of the degree of correlation weight number combining of each webpage in original query result and synonym Query Result.
7. a search engine, is characterized in that, this search engine comprises search component, and search component comprises:
Query analysis module, for receiving the original query of user search; Analyze described original query, to obtain the synonym of the former word that is present in original query and this former word and described former word and described synon synonym linguistic context, and according to described original query and described synonym linguistic context, the former word substituted by described synonym in original query is inquired about to obtain synonym;
Search module, for obtaining the set of original query results web page and the set of synonym queried result website according to described original query and synonym query search;
Registration calculates and result merges module, for calculating the registration of webpage in described original query result and synonym Query Result; And merge according to the predetermined consolidation strategy corresponding with described registration the results web page set that original query and synonym inquire about, and generate search result list; Wherein,
Described search engine also comprises log analyzer, for
Obtain the user of history and inquire about click data, described data comprise history inquiry and return in response to this inquiry and the queried result website of clicked access;
Identify synonym pair, described synonym is to comprising the former word be present in described historical query and the corresponding synonym be present in described queried result website; And
The right synonym linguistic context of described synonym is defined as to historical query record described in major general.
8. search engine according to claim 7, is characterized in that, the calculating of described registration comprises the quantity calculating the webpage overlapped in original query result U1 and synonym Query Result U2 | U1 ∩ U2|.
9. search engine according to claim 8, is characterized in that, the calculating of described registration also comprises determines a Min less in the webpage quantity of original query result and the webpage quantity of synonym Query Result (| U1|, | U2|); Described registration I (U1, U2)=| U1 ∩ U2|/Min (| U1|, | U2|).
10. search engine according to claim 8, is characterized in that, the calculating of described registration also comprises the summation calculating the webpage quantity of original query result and the webpage quantity of synonym Query Result | U1 ∪ U2|; Described registration I (U1, U2)=| U1 ∩ U2|/| U1 ∪ U2|.
11. search engines according to claim 7, it is characterized in that, described consolidation strategy comprises: when the value of described registration is less than predetermined threshold value in predetermined registration interval, described predetermined consolidation strategy for carrying out suppressing process when merging synonym Query Result, described in suppress process and comprise:
Reduce the degree of correlation weights of webpage in synonym Query Result; Or
After synonym Query Result being inserted into the specific page of search result list; Or
Synonym Query Result is adjusted to after original query result.
12. search engines according to claim 7, it is characterized in that, described consolidation strategy comprises: when the value of registration is greater than predetermined threshold value in predetermined registration interval, according to the original result of inquiring about with synonym of the degree of correlation weight number combining of each webpage in original query result and synonym Query Result.
CN201110079699.1A 2011-03-31 2011-03-31 Search engine and implementation method thereof Active CN102722499B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110079699.1A CN102722499B (en) 2011-03-31 2011-03-31 Search engine and implementation method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110079699.1A CN102722499B (en) 2011-03-31 2011-03-31 Search engine and implementation method thereof

Publications (2)

Publication Number Publication Date
CN102722499A CN102722499A (en) 2012-10-10
CN102722499B true CN102722499B (en) 2015-07-01

Family

ID=46948266

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110079699.1A Active CN102722499B (en) 2011-03-31 2011-03-31 Search engine and implementation method thereof

Country Status (1)

Country Link
CN (1) CN102722499B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104156391A (en) * 2014-07-09 2014-11-19 北京奇虎科技有限公司 Device and method for displaying menus in mobile search results
CN105989125B (en) * 2015-02-16 2019-08-16 苏宁易购集团股份有限公司 The searching method and system of tag recognition are carried out to no result word
CN105659235A (en) * 2016-01-08 2016-06-08 马岩 A term searching method for network information and a system thereof
CN105874457A (en) * 2016-03-30 2016-08-17 马岩 Network information push method and system
WO2018023481A1 (en) * 2016-08-03 2018-02-08 王晓光 Method and system for applying synonym in big data search
CN106250516A (en) * 2016-08-03 2016-12-21 王晓光 Synonym application process in big data search and system
CN106294784B (en) * 2016-08-12 2019-12-17 合一智能科技(深圳)有限公司 resource searching method and device
CN107729347B (en) 2017-08-23 2021-06-11 北京百度网讯科技有限公司 Method, device and equipment for acquiring synonym label and computer readable storage medium
CN110196941A (en) * 2018-07-24 2019-09-03 腾讯科技(深圳)有限公司 A kind of information recommended method, device, server and storage medium
CN111666417B (en) * 2020-04-13 2023-06-23 百度在线网络技术(北京)有限公司 Method, device, electronic equipment and readable storage medium for generating synonyms
CN116344012B (en) * 2023-05-29 2023-08-18 北京梆梆安全科技有限公司 Medical management system based on diagnosis and treatment log

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1873642A (en) * 2006-04-29 2006-12-06 上海世纪互联信息系统有限公司 Searching engine with automating sorting function
CN101241512A (en) * 2008-03-10 2008-08-13 北京搜狗科技发展有限公司 Search method for redefining enquiry word and device therefor
CN101576916A (en) * 2009-06-18 2009-11-11 清华大学 Method and device for obtaining synonyms
CN101645082A (en) * 2009-04-17 2010-02-10 华中科技大学 Similar web page duplicate-removing system based on parallel programming mode
CN101878476A (en) * 2007-06-22 2010-11-03 谷歌公司 Machine translation for query expansion

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1873642A (en) * 2006-04-29 2006-12-06 上海世纪互联信息系统有限公司 Searching engine with automating sorting function
CN101878476A (en) * 2007-06-22 2010-11-03 谷歌公司 Machine translation for query expansion
CN101241512A (en) * 2008-03-10 2008-08-13 北京搜狗科技发展有限公司 Search method for redefining enquiry word and device therefor
CN101645082A (en) * 2009-04-17 2010-02-10 华中科技大学 Similar web page duplicate-removing system based on parallel programming mode
CN101576916A (en) * 2009-06-18 2009-11-11 清华大学 Method and device for obtaining synonyms

Also Published As

Publication number Publication date
CN102722499A (en) 2012-10-10

Similar Documents

Publication Publication Date Title
CN102722498B (en) Search engine and implementation method thereof
CN102722501B (en) Search engine and realization method thereof
CN102722499B (en) Search engine and implementation method thereof
CN102737021B (en) Search engine and realization method thereof
CN102073725B (en) Method for searching structured data and search engine system for implementing same
US8117198B2 (en) Methods for generating search engine index enhanced with task-related metadata
CN101452453B (en) A kind of method of input method Web side navigation and a kind of input method system
US8126888B2 (en) Methods for enhancing digital search results based on task-oriented user activity
US8706748B2 (en) Methods for enhancing digital search query techniques based on task-oriented user activity
CN102073726B (en) Structured data import method and device for search engine system
CN100524307C (en) Method and device for establishing coupled relation between documents
US20090265338A1 (en) Contextual ranking of keywords using click data
CN101073080A (en) Suggesting search engine keywords
JP2005085285A5 (en)
TWI547815B (en) Information retrieval method and device
JP2009512070A (en) System, method, and computer program product for concept-based search and analysis
CN101685448A (en) Method and device for establishing association between query operation of user and search result
US9971828B2 (en) Document tagging and retrieval using per-subject dictionaries including subject-determining-power scores for entries
US20090083266A1 (en) Techniques for tokenizing urls
US8234584B2 (en) Computer system, information collection support device, and method for supporting information collection
CN103942268A (en) Method and device for combining search and application and application interface
Gasparetti et al. Exploiting web browsing activities for user needs identification
US11941073B2 (en) Generating and implementing keyword clusters
CN102063454A (en) Method and equipment combining search and application
US20130031075A1 (en) Action-based deeplinks for search results

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant