Summary of the invention
The embodiment of the invention provides a kind of method and server of extending user Search Results to the problems referred to above that exist in the prior art, can intelligently for the user abundanter Search Results be provided.
For this reason, the embodiment of the invention provides following technical scheme:
A kind of method of extending user Search Results comprises:
Obtain the searching key word of user in the search interface input;
Obtain the conjunctive word that is associated with said searching key word;
According to the query composition index data base of said searching key word, conjunctive word and said searching key word and conjunctive word, obtain Search Results respectively;
Go heavy and ordering to said Search Results;
Search Results after the ordering is sent to client, so that client represents the Search Results of receiving to said user.
Preferably, the said conjunctive word that is associated with said searching key word that obtains comprises:
Search association rule database according to said searching key word;
If the correlation rule that comprises said searching key word is arranged in the said association rule database, then from said correlation rule, obtain the conjunctive word that is associated with said searching key word.
Preferably, said method also comprises:
Setting comprises keyword and the correlation rule of the conjunctive word that is associated with said keyword; And/or, generate the correlation rule of the conjunctive word that comprises keyword and be associated with said keyword according to a plurality of searching key words that said user imports;
Said correlation rule is saved in the said association rule database.
Preferably, said method also comprises:
Add up all users' search behavior and/or Search Results;
Confirm the power of incidence relation between keyword and the corresponding conjunctive word in the said correlation rule according to statistics;
According to definite result the correlation rule in the said association rule database is safeguarded.
Preferably, saidly confirm that according to statistics the power of incidence relation between keyword and the corresponding conjunctive word in the said correlation rule comprises:
Calculate the support and/or the degree of confidence of said correlation rule according to statistics;
If said support greater than the preset confidence threshold value, confirms then that said correlation rule be strong association greater than the support threshold value of setting and/or said degree of confidence; Otherwise be weak association.
A kind of server comprises:
The keyword acquiring unit is used to obtain the searching key word of user in the search interface input;
The conjunctive word acquiring unit is used to obtain the conjunctive word that is associated with said searching key word;
Query unit is used for respectively obtaining Search Results according to the query composition index data base of said searching key word, conjunctive word and said searching key word and conjunctive word;
The arrangement unit is used for going heavy and ordering to said Search Results;
Transmitting element is used for the Search Results after the ordering is sent to client, so that client represents the Search Results of receiving to said user.
Preferably; Said conjunctive word acquiring unit; Specifically be used for searching association rule database according to said searching key word; If the correlation rule that comprises said searching key word is arranged in the said association rule database, then from said correlation rule, obtain the conjunctive word that is associated with said searching key word.
Preferably, said server also comprises: rule is provided with unit and/or regular generation unit, and preserves the unit;
Said rule is provided with the unit, is used to be provided with the correlation rule of the conjunctive word that comprises keyword and be associated with said keyword;
Said regular generation unit is used for a plurality of searching key words according to said user's input, generates the correlation rule of the conjunctive word that comprises keyword and be associated with said keyword;
Said preservation unit is used for said correlation rule is saved in said association rule database.
Preferably, said server also comprises:
Statistic unit is used to add up all users' search behavior and/or Search Results;
The degree of association is confirmed the unit, the power of incidence relation between the keyword that is used for confirming said correlation rule according to statistics and corresponding conjunctive word;
The rule maintenance unit is used for according to confirming that the result safeguards the correlation rule of said association rule database.
Preferably, the said degree of association confirms that the unit comprises:
Computation subunit is used for calculating according to statistics the support and/or the degree of confidence of said correlation rule;
Analyze subelement, be used for during greater than the preset confidence threshold value, confirming that said correlation rule is for strong related greater than the support threshold value of setting and/or said degree of confidence in said support; Otherwise be weak association.
The method and the server of embodiment of the invention extending user Search Results; Searching key word to user's input; Excavated the conjunctive word that has incidence relation with said searching key word; And, obtain search result corresponding, thereby expanded Search Results respectively according to the query composition index data base of said searching key word, conjunctive word and said searching key word and conjunctive word; The document that will have incidence relation with the searching key word of user's input also offers the user in the lump, for the user provides abundanter Search Results.
Embodiment
In order to make those skilled in the art person understand the scheme of the embodiment of the invention better, the embodiment of the invention is done further to specify below in conjunction with accompanying drawing and embodiment.
The method and the server of embodiment of the invention extending user Search Results; Searching key word to user's input has excavated the conjunctive word that has incidence relation with said searching key word, that is to say; Intelligently user's search behavior and expectation are predicted; And, obtain search result corresponding, thereby expanded Search Results respectively according to the query composition index data base of said searching key word, conjunctive word and said searching key word and conjunctive word; The document that will have incidence relation with the searching key word of user's input also offers the user in the lump, for the user provides abundanter Search Results.
As shown in Figure 1, be the process flow diagram of the method for embodiment of the invention extending user Search Results, comprise following basic step:
Step 101 is obtained the searching key word of user in the search interface input.
Said searching key word can be a Chinese, English arbitrarily, can be independent speech, also can be phrase, and the keyword of user's input can be one or more.
In addition, user's input also can be the phrase that comprises one or more keys, has imported " 360 with QQ Great War " such as the user, then can therefrom extract " 360 ", " QQ " and " Great War " these several keywords.Concrete extracting mode can be handled according to extracting mode of the prior art, and this embodiment of the invention is not done qualification.
For above-mentioned situation, server can be searched for respectively and each keyword document matching, obtains search result corresponding.
Step 102 is obtained the conjunctive word that is associated with said searching key word.
In embodiments of the present invention; Can set up various correlation rules in advance; The conjunctive word that comprises keyword in the said correlation rule and be associated with said keyword; For the ease of these correlation rules are safeguarded, can also the various correlation rules of setting up be saved in the association rule database so that when needed to said correlation rule upgrade, increase or deletion etc.
Such as; Some have very strong ageing information, and As time goes on, these information no longer become focus; People also can descend to the attention rate of these information; Correspondingly, the correlation rule relevant with these information also need upgrade or delete, and is that the user provides some unwanted Search Results to avoid.
Correspondingly; Server is behind the said searching key word that receives the client transmission; Just can search said association rule database according to said searching key word; If the correlation rule that comprises said searching key word is arranged in the said association rule database, then from said correlation rule, obtain the conjunctive word that is associated with said searching key word.
Need to prove that the foundation of said correlation rule can have multiple mode, such as:
(1) sets up the correlation rule of the conjunctive word comprise keyword and to be associated through set-up mode, that is to say, set up said correlation rule through human-edited's mode with said keyword.
(2) front is mentioned; The keyword of user's input can be a plurality of; For the situation that a plurality of keywords are arranged, just can have incidence relation between these keywords, therefore; Can also generate the correlation rule of the conjunctive word that comprises keyword and be associated automatically by a plurality of searching key words of server according to said user's input with said keyword.Need to prove that said server can be a search engine server, the user that it was directed against also is meant the user that all use this search engine.
Certainly, can adopt above-mentioned dual mode to set up corresponding correlation rule simultaneously, and the situation that other modes can also be arranged and deposit, the embodiment of the invention are not done qualification to this yet.
Such as, in said association rule database, the correlation rule shown in the table 1 is arranged:
Table 1:
ID |
Rule |
1 |
Potato=>dietary function |
2 |
QQ=>360 |
3 |
{ Zhang San, Li Si }=>lawsuit |
4 |
Law of conservation of mass=>Luo Mengnuosuofu |
5 |
Einstein=>relativity |
6 |
Einstein=>Nobel Prize in physics |
The front is mentioned, and the keyword of user's input can be one or more.For the situation of having only a keyword, when searching said association rule database, possibly obtain the one or more conjunctive words corresponding with this keyword.Such as, the user has imported searching key word " einstein " at search interface, then searches said association rule database, can obtain two conjunctive words being associated with keyword " einstein ", i.e. " relativity " and " Nobel Prize in physics ".For the situation that a plurality of keywords are arranged; When searching said association rule database; Can search according to said a plurality of keywords, such as, the user has imported searching key word " Zhang San " and " Li Si " at search interface; Then search said association rule database, can obtain the conjunctive word " lawsuit " that is associated with keyword " Zhang San " and " Li Si ".
In addition; Have when a plurality of at the keyword of user input, these keywords have certain incidence relation usually, therefore; Server also can extract these keywords; Generate corresponding correlation rule, if the record of this correlation rule not in the said association rule database then is saved in the correlation rule that generates in the said association rule database.Such as; The user has imported searching key word " 360 " and " network security " at search interface; Then server generates correlation rule { 360=>network security } according to the keyword of user's input; And this correlation rule not in the said association rule database, then server adds the correlation rule { 360=>network security } that generates in the said association rule database to.
Step 103 according to the query composition index data base of said searching key word, conjunctive word and said searching key word and conjunctive word, obtains Search Results respectively.
Such as, if the user imports QQ, the said association rule database of whois lookup obtains the conjunctive word 360 with the relevant relation of QQ, then to { QQ}, { { 360} searches for respectively, obtains search result corresponding for QQ, 360}.
Said Search Results can comprise the summary or the partial content of relevant documentation, can further include the URL of said document, is linked to relevant documentation so that the user can click URL.
Need to prove; Some internet web page information of regularly collecting have been preserved in the said index data base; Concrete collection mode can adopt prior art; Such as utilizing web crawlers program search internet web page, set up the corresponding index of different web pages information, deposit the index of setting up in said index data base.
Step 104 is gone heavy and ordering to said Search Results.
Because once search may obtain a plurality of Search Results, go heavy and ordering to these Search Results, the user is better experienced.
Said go heavily to be meant a plurality of identical result document are only kept one, concrete implementation procedure can not done qualification to this embodiment of the invention with reference to prior art.
When Search Results is sorted, can the Search Results of the said searching key word of correspondence be come the front, be corresponding said searching key word then and the Search Results of related contamination, be the Search Results of corresponding said conjunctive word at last.Certainly, also can adopt other orders.
In addition, when Search Results is sorted, can also take all factors into consideration other factors, such as, the time that can basis relevant with said Search Results source information document produces sorts to said Search Results, before the Search Results that the time is nearest comes; Can also basis relevant with the said Search Results source information document and the matching degree of said searching key word sort to said Search Results; Before the Search Results that matching degree is the highest came, the calculating of said matching degree can be carried out according to account form of the prior art.When considering that multiple factor sorts to said Search Results, can set different weights to different factors, according to the priority of each each Search Results of weight calculation, priority is high comes the front.
Step 105 sends to client with the Search Results after the ordering, so that client represents the Search Results of receiving to said user.
Need to prove, can the whole of the Search Results after the ordering or ordering be sent to client in preceding part.
It is thus clear that the method for embodiment of the invention extending user Search Results is to the searching key word of user's input; Excavated the conjunctive word that has incidence relation with said searching key word; That is to say, intelligently user's search behavior and expectation are predicted, and respectively according to the query composition index data base of said searching key word, conjunctive word and said searching key word and conjunctive word; Obtain search result corresponding; Thereby expanded Search Results, the document that will have incidence relation with the searching key word of user's input also offers the user in the lump, for the user provides abundanter Search Results.
The front is mentioned, and in instance of the present invention, described correlation rule can have multiple mode to set up, such as, the mode through being provided with, or set up corresponding correlation rule by server automatically according to a plurality of searching key words of user's input.And, can also these correlation rules be saved in the same association rule database.
In order further to guarantee the strong correlation of these correlation rules; Equally; Can carry out regular update to it by manual work; Can also analyze all users' search behavior and/or Search Results by server, automatically these correlation rules safeguarded, be elaborated in the face of this down according to analysis result.
At first, brief account two notions relevant: support, degree of confidence with said correlation rule.Said correlation rule, support, degree of confidence all are the notion in the data mining subject at first, wherein:
A correlation rule can be designated as:
A=>B (1)
Wherein, A representes keyword, and B representes the conjunctive word of A.
Support is defined as:
Wherein, and n (A " B) expression A and the simultaneous number of times of B, N representes the number of all affairs.
Degree of confidence is defined as:
Wherein, n (A) represents the number of times that A takes place.
Support and degree of confidence can be represented the power of incidence relation between a plurality of clauses and subclauses.
Need to prove, can find out that (A=>B) must equal sup, and (B=>A), ((value of B=>A) is then different for A=>B) and conf for conf for sup by last formula (2), (3).
Based on above-mentioned principle; In embodiments of the present invention; Can also may further comprise the steps: add up all users' search behavior and/or Search Results, confirm the power of incidence relation between keyword and the corresponding conjunctive word in the said correlation rule, the correlation rule in the said association rule database is safeguarded according to definite result according to statistics; Particularly, can be that correlation rule is upgraded, adds or operation such as deletion.
Between keyword and the corresponding conjunctive word confirmed according to statistics in the said correlation rule, during incidence relation strong and weak, can multiple implementation specifically be arranged, to specifying for example below this based on foregoing support and/or degree of confidence.
(1) confirms the power of incidence relation between keyword and the corresponding conjunctive word in the said correlation rule according to all users' search behavior
For instance, when supposing to have several users to use search, the several query words below having imported:
1.360 Great War QQ;
2.QQ prosecute 360;
3.QQ。
Suppose that A is 360, B is QQ, then according to above-mentioned these search behaviors, can obtain:
N (A " B)=2, N=3, so, sup (A=>B)=2/3=0.667;
N (A)=2, thus conf (A=>B)=2/2=1.0;
Equally, can obtain conf (B=>A)=2/3=0.667.
(2) confirm the power of incidence relation between keyword and the corresponding conjunctive word in the said correlation rule according to all users' Search Results
In embodiments of the present invention, can user's one query be called affairs.
For the current incidence relation that has existed in the association rule database, 360 related QQ for example, to 360, QQ and 360, QQ} searches for the result document that obtains and follows the trail of statistics, supposes that statistics is following in certain period:
These result document add up to N=100;
The number that only comprises 360 document is: n (360)=10;
The number that only comprises the document of QQ is n (QQ)=20;
Not only comprised 360 but also the number that comprises the document of QQ be n (A " B)=70;
Then calculate and can obtain:
sup(360=>QQ)=70/100=0.7;
conf(360=>QQ)=70/(70+10)=0.875;
conf(QQ=>360)=70/(70+20)=0.778。
These three values are dynamic changes, if a certain period, these three values are all diminishing, and explains that 360 are weakening with the incidence relation of QQ, otherwise explain that then its incidence relation is in enhancing.
(3) comprehensive above-mentioned two kinds of statisticses, search behavior and the Search Results of promptly taking all factors into consideration the user are confirmed the power of incidence relation between keyword and the corresponding conjunctive word in the said correlation rule
Such as, can give specific weight to the statistical value of said search behavior and Search Results respectively, when calculating said support and degree of confidence, carry out weighted mean according to weight separately, weight separately can be identical, also can be different.
Need to prove, to above-mentioned (1) kind statistical computation mode, in user's once search; The situation of three or the keyword more than three may appear comprising in the searching key word of user input, have imported " 360 with QQ lawsuit situation " such as the user, and keyword set is exactly { 360 so; QQ, the lawsuit situation }, at this moment; Can calculate respectively the support and the degree of confidence of each combination, comprise:
Sup (360=>{ QQ, lawsuit }), conf (360=>{ QQ, lawsuit });
Sup (QQ=>{ 360, lawsuit }), conf (QQ=>{ 360, lawsuit });
Sup (lawsuit=>QQ, 360}), conf (lawsuit=>QQ, 360});
Conf ({ QQ tells }=>360);
Conf ({ 360, tell }=>QQ);
Conf (QQ, 360}=>tell);
sup(360=>{QQ}),conf(360=>{QQ});
Sup (360=>{ lawsuit }), conf (360=>{ lawsuit });
Conf (lawsuit=>360);
Conf (lawsuit=>QQ);
conf(QQ=>360);
Sup (QQ=>lawsuit), conf (QQ=>lawsuit);
It is thus clear that, when the keyword number in certain search affairs is too many, can cause calculated amount too big, in the application of reality, can do a little restrictions, such as only calculating two degree of confidence and supports between the keyword.
Need to prove; After calculating said support and degree of confidence; Can confirm the power of incidence relation between keyword and the corresponding conjunctive word in the said correlation rule according to one of them, such as support threshold value and confidence threshold value are set respectively, after the support that calculates surpasses said support threshold value; Think strong association, otherwise think weak association; Equally, after the degree of confidence that calculates surpasses said confidence threshold value, think strong association.Certainly, also can take all factors into consideration this two values, after the support that calculates and degree of confidence are all above corresponding threshold, just think strong association.
In addition; When the correlation rule in the said association rule database being safeguarded according to definite result; Can determine whether the needs deletion according to the power of its incidence relation, add or revise the correlation rule in the said association rule database; Such as, the incidence relation in confirming certain correlation rule is deleted this correlation rule after belonging to weak association.
Need to prove; Above-mentioned only be to utilize in the embodiment of the invention support and/or degree of confidence judge incidence relation in the correlation rule power concrete for example; In the practical application, can also judge the power of said incidence relation, this embodiment of the invention is not done qualification through other modes.
It is thus clear that; The method of embodiment of the invention extending user Search Results; Intelligently user's search behavior and expectation are predicted that not only the document that will have incidence relation with the searching key word of user's input also offers the user in the lump, for the user provides abundanter Search Results; And, guaranteed the validity and the accuracy of the Search Results of expansion through automatic maintenance to correlation rule.
One of ordinary skill in the art will appreciate that all or part of step that realizes in the foregoing description method is to instruct relevant hardware to accomplish through program; Described program can be stored in the computer read/write memory medium; Described storage medium, as: ROM/RAM, magnetic disc, CD etc.
Correspondingly, the embodiment of the invention also provides a kind of server, and is as shown in Figure 2, is a kind of structural representation of this server.
In this embodiment, said server comprises:
Keyword acquiring unit 201 is used to obtain the searching key word of user in the search interface input;
Conjunctive word acquiring unit 202 is used to obtain the conjunctive word that is associated with said searching key word;
Query unit 203 is used for respectively obtaining Search Results according to the query composition index data base of said searching key word, conjunctive word and said searching key word and conjunctive word;
Arrangement unit 204 is used for going heavy and ordering to said Search Results;
Transmitting element 205 is used for the Search Results after the ordering is sent to client, so that client represents the Search Results of receiving to said user.
In embodiments of the present invention; Can set up various correlation rules in advance; The conjunctive word that comprises keyword in the said correlation rule and be associated with said keyword; For the ease of these correlation rules are safeguarded, can also the various correlation rules of setting up be saved in the association rule database so that when needed to said correlation rule upgrade, increase or deletion etc.
Correspondingly; Said conjunctive word acquiring unit 202; Specifically be used for searching association rule database 205 according to said searching key word; If the correlation rule that comprises said searching key word is arranged in the said association rule database, then from said correlation rule, obtain the conjunctive word that is associated with said searching key word.
Need to prove that said association rule database 205 can be inner at said server, also can be independent of outside the said server.
In addition, in embodiments of the present invention, said server also can further comprise: rule is provided with unit and/or regular generation unit, and preserves the unit, wherein:
Said rule is provided with the unit, is used to be provided with the correlation rule of the conjunctive word that comprises keyword and be associated with said keyword;
Said regular generation unit is used for a plurality of searching key words according to said user's input, generates the correlation rule of the conjunctive word that comprises keyword and be associated with said keyword;
Said preservation unit is used for said correlation rule is saved in said association rule database.
That is to say that said correlation rule can have multiple mode to generate, such as, through said rule the unit is set by manual work some correlation rules are set, can also generate some correlation rules automatically by said regular generation unit.In practical application, said server can include only said rule any in unit and the said regular generation unit is set, and also can comprise this two unit simultaneously.Certainly, the embodiment of the invention is not limited in above-mentioned these implementations, can also adopt other modes or above-mentioned variety of way and other modes and the mode of depositing generates said correlation rule, and this is enumerated no longer one by one.
It is thus clear that; The server of the embodiment of the invention; To the searching key word of user input, intelligently user's search behavior and expectation are predicted, and respectively according to the query composition index data base of said searching key word, conjunctive word and said searching key word and conjunctive word; Obtain search result corresponding; Thereby expanded Search Results, the document that will have incidence relation with the searching key word of user's input also offers the user in the lump, for the user provides abundanter Search Results.
As shown in Figure 3, be the another kind of structural representation of embodiment of the invention server.
Be that with difference embodiment illustrated in fig. 2 in this embodiment, said server also further comprises:
Statistic unit 206 is used to add up all users' search behavior and/or Search Results;
The degree of association is confirmed unit 207, the power of incidence relation between the keyword that is used for confirming said correlation rule according to statistics and corresponding conjunctive word;
Rule maintenance unit 208 is used for according to confirming that the result safeguards the correlation rule of said association rule database, and particularly, this maintenance can be deletion, interpolation or revise the correlation rule in the said association rule database.
In embodiments of the present invention, the said degree of association confirms that unit 207 can confirm the power of incidence relation between keyword and the corresponding conjunctive word in the said correlation rule in several ways, such as, can confirm according to support and/or degree of confidence.
Correspondingly, the said degree of association confirms that unit 207 comprises:
Computation subunit is used for calculating according to statistics the support and/or the degree of confidence of said correlation rule;
Analyze subelement, be used for during greater than the preset confidence threshold value, confirming that said correlation rule is for strong related greater than the support threshold value of setting and/or said degree of confidence in said support; Otherwise be weak association.
Certainly, the embodiment of the invention is not limited in above-mentioned this implementation, and in practical application, the said degree of association confirms that unit 207 can also confirm the power of said incidence relation through other modes, and this embodiment of the invention is not done qualification.
The server of the embodiment of the invention; Not only intelligently user's search behavior and expectation are predicted; The document that will have incidence relation with the searching key word of user's input also offers the user in the lump; For the user provides abundanter Search Results, and, guaranteed the validity and the accuracy of the Search Results of expansion through automatic maintenance to correlation rule.
Identical similar part is mutually referring to getting final product between each embodiment in this instructions, and each embodiment stresses all is the difference with other embodiment.Especially, for Apparatus and system embodiment, because it is basically similar in appearance to method embodiment, so describe fairly simplely, relevant part gets final product referring to the part explanation of method embodiment.System embodiment described above only is schematic; Wherein said unit as the separating component explanation can or can not be physically to separate also; The parts that show as the unit can be or can not be physical locations also; Promptly can be positioned at a place, perhaps also can be distributed on a plurality of NEs.Can realize the purpose of present embodiment scheme according to the needs selection some or all of module wherein of reality.Those of ordinary skills promptly can understand and implement under the situation of not paying creative work.
More than disclosedly be merely preferred implementation of the present invention; But the present invention is not limited thereto; Any those skilled in the art can think do not have a creationary variation, and, all should drop in protection scope of the present invention not breaking away from some improvement and the retouching of being done under the principle of the invention prerequisite.