CN102446174B - A kind of in the network device for determining the method and apparatus of crucial sub-word weight - Google Patents

A kind of in the network device for determining the method and apparatus of crucial sub-word weight Download PDF

Info

Publication number
CN102446174B
CN102446174B CN201010501398.9A CN201010501398A CN102446174B CN 102446174 B CN102446174 B CN 102446174B CN 201010501398 A CN201010501398 A CN 201010501398A CN 102446174 B CN102446174 B CN 102446174B
Authority
CN
China
Prior art keywords
word
sub
keyword
weight
long
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201010501398.9A
Other languages
Chinese (zh)
Other versions
CN102446174A (en
Inventor
林赛群
何仁清
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201010501398.9A priority Critical patent/CN102446174B/en
Publication of CN102446174A publication Critical patent/CN102446174A/en
Application granted granted Critical
Publication of CN102446174B publication Critical patent/CN102446174B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to a kind of in the network device for determining the method and apparatus of crucial sub-word weight, the present invention is by obtaining the long-tail keyword from user, and determine the weight of the sub-word of multiple keys that described long-tail keyword comprises according to the first pre-defined rule based on association keyword, wherein, described association keyword is associated with at least one in the sub-word of described multiple key.Compared with prior art, the present invention has the following advantages: 1) many from the text of large section or to extract keyword in query different from prior art, provided by the inventionly determine that the scheme of crucial sub-word is for long-tail keyword, effectively can judge the weight of the sub-word of key comprised in long-tail keyword; 2) the present invention is by the cohesion of the keyword of horizontal analysis user and/or vertical analysis from associating between the long-tail keyword of this user and the keyword of other users, further increases the accuracy that weight judges.

Description

A kind of in the network device for determining the method and apparatus of crucial sub-word weight
Technical field
The present invention relates to computer networking technology, particularly relating to a kind of in the network device for determining the method and apparatus of crucial sub-word weight.
Background technology
Non-targeted keyword on website but also can bring search flow keyword, be called long-tail keyword.Long-tail keyword is by 2-3 word often, or even phrase composition, and the title of its page unless the context, is present in content toward contact.Although long-tail keyword search amount is few, and comparatively unstable, and its client brought is converted into the probability of website product client far above target keyword.
Therefore, in search price auction, many users can select to auction a certain amount of long-tail keyword.But, the long-tail keyword sense of query often with auctioned of common search subscriber input is close but different, therefore, in order to improve searching accuracy and search efficiency, need to determine the core word in long-tail keyword and weight thereof, and in prior art, still lack the determination scheme of core word for long-tail keyword and weight thereof.
Summary of the invention
The object of this invention is to provide a kind of in the network device for determining the method and apparatus of crucial sub-word weight.
According to an aspect of the present invention, provide a kind of in the network device for determining the method for crucial sub-word weight, wherein, the method comprises the following steps:
A obtains the long-tail keyword from user;
B is according to the first pre-defined rule and determine the weight of the sub-word of multiple keys that described long-tail keyword comprises based on association keyword, and wherein, described association keyword is associated with at least one in the sub-word of described multiple key.
According to another aspect of the present invention, additionally provide a kind of network equipment for determining crucial sub-word weight, wherein, this network equipment comprises:
First acquisition device, for obtaining the long-tail keyword from user;
Weight analysis device, for determining the weight of the sub-word of multiple keys that described long-tail keyword comprises according to the first pre-defined rule based on association keyword, wherein, described association keyword is associated with at least one in the sub-word of described multiple key.
Compared with prior art, the present invention has the following advantages: 1) many from the text of large section or to extract keyword in query different from prior art, provided by the inventionly determine that the scheme of crucial sub-word is for long-tail keyword, effectively can judge the weight of the sub-word of key comprised in long-tail keyword; 2) the present invention is by the cohesion of the keyword of horizontal analysis user and/or vertical analysis from associating between the long-tail keyword of this user and the keyword of other users, further increases the accuracy that weight judges.
Accompanying drawing explanation
By reading the detailed description done non-limiting example done with reference to the following drawings, other features, objects and advantages of the present invention will become more obvious:
Fig. 1 be one aspect of the invention in the network device for determining the method flow diagram of crucial sub-word weight;
Fig. 2 be a preferred embodiment of the invention in the network device for determining the method flow diagram of crucial sub-word weight;
Fig. 3 be another preferred embodiment of the present invention in the network device for determining the method flow diagram of crucial sub-word weight;
Fig. 4 be another preferred embodiment of the present invention in the network device for determining the method flow diagram of crucial sub-word weight;
Fig. 5 is the tree schematic diagram of the incidence relation collection of a preferred embodiment of the invention;
Fig. 6 be the present invention again a preferred embodiment in the network device for determining the method flow diagram of crucial sub-word weight;
Fig. 7 is the network equipment infrastructure schematic diagram for determining crucial sub-word weight of one aspect of the invention;
Fig. 8 is the network equipment infrastructure schematic diagram for determining crucial sub-word weight of a preferred embodiment of the invention;
Fig. 9 is the network equipment infrastructure schematic diagram for determining crucial sub-word weight of another preferred embodiment of the present invention;
Figure 10 is the network equipment infrastructure schematic diagram for determining crucial sub-word weight of another preferred embodiment of the present invention;
Figure 11 is the network equipment infrastructure schematic diagram for determining crucial sub-word weight of the present invention's preferred embodiment again;
In accompanying drawing, same or analogous Reference numeral represents same or analogous parts.
Embodiment
Below in conjunction with accompanying drawing, the present invention is described in further detail.
Fig. 1 be show one aspect of the invention in the network device for determining the method flow diagram of crucial sub-word weight.Wherein, the described network equipment includes but not limited to the server group that single network server, multiple webserver form or the cloud be made up of a large amount of computing machine or the webserver based on cloud computing (CloudComputing), wherein, cloud computing is the one of Distributed Calculation, the super virtual machine be made up of a group loosely-coupled computing machine collection.
In step sl, the network equipment obtains the long-tail keyword from user.Wherein, described long-tail keyword comprises the sub-word of multiple key, the sub-word of described key is the vocabulary that can be separated from described long-tail keyword, and, each the crucial sub-word belonging to same long-tail keyword can be associated, such as, in long-tail keyword " Beijing fresh flower express delivery ", " Beijing fresh flower ", " fresh flower express delivery " " Beijing ", " fresh flower ", " express delivery " are the sub-word of key that this long-tail keyword comprises.Particularly, the mode that the network equipment obtains described long-tail keyword includes but not limited to:
1) network equipment obtains long-tail keyword from the keywords database of this user;
The network equipment is searched in the keywords database of this user, whether semantically comprising multiple word, judging and obtaining long-tail keyword according to searching the keyword obtained; Or the network equipment is classified to keyword, the network equipment directly therefrom can obtain long-tail keyword in sorted keywords database;
Wherein, the subscriber equipment that user can connect with the described network equipment by any one, input described long-tail keyword, to make the described network equipment be recorded in the keywords database of this user by described keyword, described subscriber equipment includes but not limited to computing machine, smart mobile phone, PDA or IPTV etc.; Or described long-tail keyword can directly be directly inputted in the keywords database of this user by network-side operating personnel;
2) network equipment directly obtains the long-tail keyword that user is inputted by subscriber equipment;
The long-tail keyword of Water demand by installing client on a user device, or by browser interconnection network equipment, is supplied to the network equipment by user.
Then, in step s 2, the network equipment determines the weight of the sub-word of multiple keys that described long-tail keyword comprises according to the first pre-defined rule based on association keyword.Wherein, described association keyword is associated with at least one in the sub-word of described multiple key, and described association keyword can be only semantically comprising the keyword of a word, also can be other long-tail keywords.When at least one in the sub-word of multiple keys that described long-tail keyword comprises and described same or similar at the keyword semantically comprising a word, can think that this keyword is associated with described long-tail keyword; When at least one in the sub-word of multiple keys that described long-tail keyword comprises appears in other long-tail keywords described or time similar at least one crucial sub-word that these other long-tail keywords comprise, can think that this keyword is associated with described long-tail keyword.
The network equipment according to the first pre-defined rule and association keyword, determine in the process of the weight of the sub-word of described multiple key, with reference to following at least one factor:
The number of times that-sub-the word of described multiple key or its approximate word occur respectively;
Particularly, the network equipment searches association keyword in the keywords database of this user and/or in the keywords database of other users, and record the sub-word of described multiple key and appear at number of times in described association keyword, namely the number associating keyword be associated with the sub-word of described multiple key is respectively recorded, to record the number of times obtained more, then the weight of this crucial sub-word is higher;
The sub-word of-described multiple key and the relative described semantic similarity associating keyword;
Particularly, the network equipment searches association keyword in the keywords database of this user and/or in the keywords database of other users, and for the sub-word of each key, each analyzing that this crucial sub-word and this crucial sub-word be associated respectively associates similarity between keyword, such as, when this crucial sub-word is identical with a crucial keyword, then give the similarity evaluation of the first estate, when this crucial sub-word with one to associate keyword similar semantically, then give the similarity evaluation of the second grade, when this crucial sub-word is same or similar with a part associated in keyword, then give the evaluation etc. of the tertiary gradient, then, this crucial sub-word of comprehensive analysis associates the similarity between keyword with each, such as, described similarity is averaged and calculates, obtain weight.
In the present invention, also can comprehensive above-mentioned two parameters, obtain weight.Such as, after obtaining weight according to the sub-word of described multiple key or its approximate word number of times appeared at respectively in described association keyword, more described weight is adjusted according to the sub-word of described multiple key with the relative described semantic similarity associating keyword.
It should be noted that, those skilled in the art should understand that, above-mentioned citing is only and technical scheme of the present invention is described better, but not to the restriction that the present invention does, any according to described first pre-defined rule and association keyword, determine the scheme of the weight of the sub-word of described multiple key, within the scope of the present invention, and all should be contained in this by reference.
Fig. 2 show a preferred embodiment of the present invention in the network device for determining the method flow diagram of crucial sub-word weight.In the present embodiment, step S2 comprises step S211 and step S212 further.
In step sl, the network equipment obtains the long-tail keyword from user.This step is described in detail in reference to the embodiment described in Fig. 1, comprises by reference, repeat no more at this.
Then, in step S211, the network equipment, based on described long-tail keyword, obtains described association keyword from from other keywords of this user.
Particularly, the network equipment mates in other keywords of this user, when judging all or part of content of this long-tail keyword, can match with all or part of content of other keywords, then think that these other keywords of long-tail keyword and this are associated.
Such as, described long-tail keyword is " fresh flower express delivery ", and in the keyword of this user, also comprise " fresh flower ", " Beijing fresh flower express delivery ", " Beijing fresh flower ", then the network equipment is in the process of " fresh flower express delivery " being mated with " Beijing fresh flower speed ", judge that the full content " fresh flower express delivery " of this long-tail keyword matches with the partial content " fresh flower " in " Beijing fresh flower speed ", " Beijing fresh flower speed " is the association keyword of " fresh flower express delivery ", in the process that " fresh flower express delivery " is mated with " fresh flower ", judge that the partial content in this long-tail keyword " fresh flower express delivery " matches with the full content of " fresh flower ", " fresh flower " is the association keyword of " fresh flower express delivery ", in the process that " fresh flower express delivery " is mated with " Beijing fresh flower ", judge that the partial content " fresh flower " of this long-tail keyword " fresh flower express delivery " matches with the partial content " fresh flower " in " Beijing fresh flower ", " Beijing fresh flower speed " is the association keyword of " fresh flower express delivery ".
Or, the network equipment is by long-tail keyword described in semantic analysis, obtain the sub-word of multiple keys that described long-tail keyword comprises, and judge that wherein one or more crucial sub-words can match with all or part of content of other keywords from this user, then think that these other keywords of long-tail keyword and this are associated.
Such as, analyze " fresh flower express delivery ", obtain " fresh flower ", " express delivery " two sub-words of key, and by judging whether two sub-words of key match with from all or part of content in other keywords of this user, such as " fresh flower " matches with the full content in " fresh flower ", " express delivery " matches with the partial content in " Beijing fresh flower express delivery ", obtains associating keyword.
It should be noted that, although above-mentioned example all for be by identical word as matching, it should be appreciated by those skilled in the art that similar word, as " fresh flower " and " flower ", also can think and match.
Finally, in step S212, the network equipment is according to the first pre-defined rule and determine the weight of the sub-word of described multiple key based on the association keyword from this user.
The network equipment according to the first pre-defined rule and the association keyword from this user, is determined in the process of the weight of the sub-word of described multiple key, with reference to following at least one factor:
-sub-the word of described multiple key or its approximate word appear at the number of times in described association keyword respectively;
Particularly, the network equipment also records the sub-word of described multiple key and appears at number of times in the described association keyword obtained, namely record the number associating keyword be associated with the sub-word of described multiple key respectively, to record the number of times obtained more, then the weight of this crucial sub-word is higher;
Such as, crucial sub-word " fresh flower " appears in " fresh flower ", " Beijing fresh flower express delivery ", " Beijing fresh flower " three associating key word, and in " express delivery " turnover " Beijing fresh flower express delivery " now association keyword, then the weight of " fresh flower " is higher, and the weight of " express delivery " is lower;
The sub-word of-described multiple key and the described semantic similarity associating keyword;
Particularly, for the sub-word of each key, the network equipment analyze that this crucial sub-word and this crucial sub-word be associated respectively each associate similarity between keyword, such as, when this crucial sub-word is identical with a crucial keyword, then give the similarity evaluation of the first estate, when this crucial sub-word with one to associate keyword similar semantically, then give the similarity evaluation of the tertiary gradient, when this crucial sub-word is same or similar with a partial content associated in keyword, then give the evaluation etc. of the second grade, then, this crucial sub-word of comprehensive analysis associates the similarity between keyword with each, such as, described similarity is added up and process etc., obtain weight,
Such as, crucial sub-word " fresh flower " is identical with associating keyword " fresh flower ", give the similarity evaluation of the first estate, " fresh flower " with associate keyword " flower " semantic similitude, give the similarity evaluation of the tertiary gradient, " fresh flower " is identical with the partial content in " Beijing fresh flower express delivery ", give the evaluation of the second grade, finally, the similarity of " fresh flower " gained is carried out statistical treatment, determine that the weight of its comprehensive evaluation gained is the second grade etc.;
Again such as, crucial sub-word " express delivery " is only identical with the partial content in " Beijing fresh flower express delivery ", then directly give the evaluation of evaluation as " express delivery " of the second grade.
In the present invention, also can comprehensive above-mentioned two parameters, obtain weight.Particularly, after obtaining weight according to the sub-word of described multiple key or its approximate word number of times appeared at respectively in described association keyword, more described weight is adjusted according to the sub-word of described multiple key with the described semantic similarity associating keyword.Such as, it is three times according to the number of times that " fresh flower " and approximate word thereof occur in described association keyword, the occurrence number of " express delivery " is once, obtain " fresh flower " weight higher, " express delivery " weight is lower, and because of " fresh flower " identical according to the grade of similarity evaluation gained with " express delivery " according to the grade of similarity evaluation gained, then do not adjust both weight ratios.
It should be noted that, step S211 and step S212 can carry out simultaneously, and step S211 often obtains an association keyword, and step S212 can correspondingly set up according to this association keyword or adjust the weight of crucial sub-word.
What needs further illustrated is, those skilled in the art should understand that, above-mentioned citing is only and technical scheme of the present invention is described better, but not to the restriction that the present invention does, any according to described first pre-defined rule and association keyword, determine the scheme of the weight of the sub-word of described multiple key, within the scope of the present invention, and all should be contained in this by reference.
Fig. 3 show another preferred embodiment of the present invention in the network device for determining the method flow diagram of crucial sub-word weight.In the present embodiment, step S2 comprises step S221 and step S222 further.
In step sl, the network equipment obtains the long-tail keyword from user.This step is described in detail in reference to the embodiment described in Fig. 1, comprises by reference, repeat no more at this.
Then, in step S221, the network equipment based on described long-tail keyword, from from obtaining described association keyword the keyword of other users.This step is with the difference of abovementioned steps S211, the network equipment obtains association keyword from the keyword of other users but not in the keyword of this user, but its obtain the association method of keyword and abovementioned steps S211 same or similar, therefore, be contained in this by reference, repeat no more.
Finally, in step S222, the network equipment is according to described first pre-defined rule and determine the weight of the sub-word of described multiple key based on the association keyword from other users.This step is with the difference of abovementioned steps S212, described association keyword from the keyword of other users but not from the keyword of this user, but its determine the method for weight and abovementioned steps S212 same or similar, therefore, be contained in this by reference, repeat no more.
It should be noted that, step S221 and step S222 can carry out simultaneously, and step S221 often obtains an association keyword, and step S222 can correspondingly set up according to this association keyword or adjust the weight of crucial sub-word.
Fig. 4 show another preferred embodiment of the present invention in the network device for determining the method flow diagram of crucial sub-word weight.In the present embodiment, step S2 comprises step S223 step S232, step S233 and step S234 further.
In step sl, the network equipment obtains the long-tail keyword from user.This step is described in detail in reference to the embodiment described in Fig. 1, comprises by reference, repeat no more at this.
Then, in step S231, the network equipment is based on described long-tail keyword, one or more association keyword is obtained, to set up the first one or more incidence relation in described long-tail keyword and the sub-word of described key according to described one or more association keyword from from other keywords of this user.In this step, the step and the abovementioned steps S211 that how to obtain one or more association keyword are same or similar, therefore, are contained in this by reference, repeat no more.
The network equipment is in the process obtaining described association keyword, or, obtain all after association keyword, according to described association keyword the sub-word of one or more keys of being correlated with, set up the first incidence relation of described long-tail keyword and the sub-word of described one or more key.Wherein, described first incidence relation represents the long-tail keyword set up based on a user and the incidence relation can analyzing the sub-word of the key obtained in the keywords database of this user.
Particularly, when the network equipment often gets an association keyword, namely with described long-tail keyword and this association keyword the sub-word of one or more keys of being correlated with for node, the correlativity of described long-tail keyword and the sub-word of this one or more key is limit, set up this long-tail keyword and associate incidence relation between the sub-word of the one or more key of keyword with this, until obtain all association keywords, complete the first incidence relation establishing described long-tail keyword and the sub-word of one or more key.
Such as, when having got the association keyword " fresh flower " of long-tail keyword " fresh flower express delivery " in other keywords this user of the network equipment, this association keyword the sub-word of key of being correlated be " fresh flower ", namely with " fresh flower express delivery " and " fresh flower " for node, both correlativitys are limit, set up the incidence relation of " fresh flower express delivery " and " fresh flower ", subsequently, the network equipment has got association keyword " fresh flower express delivery " again in other keywords of this user, this association keyword the sub-word of key of being correlated be " fresh flower ", because " fresh flower express delivery " incidence relation with " fresh flower " is set up, therefore, the network equipment no longer repeats the incidence relation of both foundation, subsequently, the network equipment does not continue to search association keyword in other keywords of this user, then the network equipment judges that the first incidence relation that long-tail keyword " fresh flower express delivery " sets up based on this user is as " fresh flower express delivery " and the incidence relation of " fresh flower ".
Or, after obtaining the relevant keyword of institute, the network equipment with described long-tail keyword and described relevant keyword the sub-word of multiple keys of being correlated with for node, the correlativity of described long-tail keyword and the sub-word of described all multiple keys is limit, sets up the first incidence relation.
Such as, the network equipment is searched and is obtained association keyword " fresh flower " and " fresh flower express delivery " in other keywords of this user, sets up the incidence relation that long-tail keyword " fresh flower express delivery " is " fresh flower express delivery " and " fresh flower " based on the first incidence relation that this user sets up.
It should be noted that, the structure of above-mentioned incidence relation includes but not limited to: 1) tree; 2) link structure; 3) corresponding table etc.Those skilled in the art should understand that, the incidence relation of indication of the present invention is not limited with said structure, any scheme can setting up the incidence relation of described long-tail keyword and association keyword thereof all should within the scope of the present invention, and comprise by reference.
Then, in step S232, the network equipment is searched based on described long-tail keyword and from the first one or more incidence relation in the described long-tail keyword based on these other users set up from the one or more association keyword obtained the keywords database of other users and the sub-word of described key.
Such as, for long-tail keyword " fresh flower express delivery ", the network equipment searches the first incidence relation of this long-tail keyword crucial sub-word with it obtained based on other two users, such as, the first incidence relation based on one of them user is " fresh flower express delivery " and " fresh flower ", and the first incidence relation based on another user is " fresh flower express delivery " and " express delivery ".
Then, in step S233, the network equipment merges based on the first one or more incidence relation in the described long-tail keyword of this user and the sub-word of described key, to obtain the second one or more incidence relation in described long-tail keyword and the sub-word of described key with based on the first one or more incidence relation in the described long-tail keyword of these other users and the sub-word of described key.Wherein, described second incidence relation represents the incidence relation between this long-tail keyword that a long-tail keyword is set up based on multiple user and the crucial sub-word of one or more.
Such as, the network equipment is by the first incidence relation based on " fresh flower " in described long-tail keyword " fresh flower express delivery " and the sub-word of described key of this user, merge with " express delivery " with " fresh flower ", " fresh flower express delivery " with two the first incidence relations " fresh flower express delivery " based on other two users, obtain the second incidence relation [fresh flower express delivery of long-tail keyword " fresh flower express delivery " and two crucial sub-word " fresh flower ", " express delivery ", (fresh flower, express delivery)].
Finally, in step S234, the network equipment, according to described first pre-defined rule, based on described second incidence relation, determines the weight of the sub-word of described multiple key.
Wherein, described first pre-defined rule usually determines the weight of the sub-word of described key according to following at least one Xiang Yin:
The number of times that-sub-the word of described multiple key or its approximate word occur;
Such as, for [the fresh flower express delivery of the second incidence relation, (fresh flower, express delivery)], the network equipment is in the process setting up described first incidence relation and the second incidence relation, be recorded to " fresh flower " and occurred twice, " express delivery " occurred once, then think the weight of the weight of crucial sub-word " fresh flower " higher than the sub-word of key " express delivery ";
-sub-the word of described multiple key and the semantic similarity with described long-tail keyword;
Wherein, by this crucial sub-word and described long-tail keyword, the degree of approach in the second incidence relation judges described semantic similarity, such as, for [the fresh flower express delivery of the second incidence relation, (fresh flower, express delivery)], because " fresh flower " and " express delivery " all directly connects with " fresh flower express delivery ", then determine fresh flower " and the weights of " express delivery " identical.
But, it should be appreciated by those skilled in the art that in the present invention, also can comprehensive above-mentioned two parameters, obtain weight.Particularly, after obtaining weight according to the number of times of the sub-word appearance of described multiple key, more described weight is adjusted according to the semantic similarity of the sub-word of described multiple key and described long-tail keyword.
It should be noted that, above-mentioned given example, be only and better the solution of the present invention be described, but not restriction made for the present invention, it should be appreciated by those skilled in the art that and anyly set up described second incidence relation according to the first incidence relation, and determine the scheme of crucial sub-word weight, all should within the scope of the present invention, and comprise by reference.
Preferably, the present embodiment also comprise step S4(figure do not show), S5(figure do not show) and S6(figure do not show).
In step s 4 which, the network equipment searches one or more second incidence relation of other long-tail keywords with it in crucial sub-word.
Then, in step s 5, the network equipment is according to one or more second incidence relation of described long-tail keyword with it in crucial sub-word and one or more second incidence relation of other long-tail keywords described with it in crucial sub-word, and be associated set of relations.
Particularly, the network equipment is by judging whether other long-tail keywords described searched comprise described long-tail keyword or whether identical with the sub-word of the key of this long-tail keyword, judge whether the second incidence relation of the second incidence relation of described long-tail keyword and other long-tail keywords described to be merged, if can, then merge, be associated set of relations.
Such as, search when the network equipment and obtain long-tail keyword " Beijing fresh flower express delivery ", then think that " Beijing fresh flower express delivery " comprises long-tail keyword " fresh flower express delivery ", " Beijing fresh flower speed " second incidence relation with " fresh flower express delivery " is merged, such as second incidence relation of " Beijing fresh flower speed " is [Beijing fresh flower express delivery, (Beijing fresh flower, fresh flower express delivery)], second incidence relation of " fresh flower express delivery " is [fresh flower express delivery, (fresh flower, express delivery)], then merge and obtain { Beijing fresh flower express delivery, [Beijing fresh flower, fresh flower express delivery (fresh flower, express delivery)] }, its tree figure as shown in Figure 5.
Finally, in step s 6, according to described first pre-defined rule, the weight of the sub-word of multiple keys that described long-tail keyword and other long-tail keywords comprise is determined.
Wherein, described first pre-defined rule usually determines the weight of the sub-word of described key according to following at least one Xiang Yin:
The number of times that-sub-the word of described multiple key or its approximate word occur;
Such as, for { Beijing fresh flower express delivery of incidence relation collection, [Beijing fresh flower, fresh flower express delivery (fresh flower, express delivery)] }, the network equipment is in the process setting up described first incidence relation, the second incidence relation and described incidence relation collection, be recorded to " fresh flower express delivery " and occurred three times, " fresh flower " occurred twice, " express delivery " occurred once, " Beijing fresh flower " occurred once, then the weight order of crucial sub-word is " fresh flower express delivery " > " fresh flower " > " express delivery "=" Beijing fresh flower ";
-sub-the word of described multiple key and the semantic similarity with long-tail keyword;
Wherein, by this crucial sub-word and described long-tail keyword, the degree of approach in the second incidence relation judges described semantic similarity, such as, for incidence relation collection { Beijing fresh flower express delivery, [Beijing fresh flower, fresh flower express delivery (fresh flower, express delivery)] }, pass through " fresh flower speed " to be connected with " Beijing fresh flower express delivery " due to " fresh flower ", " Beijing fresh flower " is directly connected with " Beijing fresh flower express delivery ", then determine that the weights of " fresh flower " are identical lower than the weights of " Beijing express delivery ".
But, it should be appreciated by those skilled in the art that in the present invention, also can comprehensive above-mentioned two parameters, obtain weight.Particularly, after obtaining weight according to the number of times of the sub-word appearance of described multiple key, more described weight is adjusted according to the semantic similarity of the sub-word of described multiple key and described long-tail keyword.
Need to further illustrate, if multiple incidence relation is concentrated have the sub-word of annexable key or long-tail keyword, then multiple incidence relation collection can merge further.
Preferably, when in step S231, one or more keywords of this user cannot set up the first incidence relation with other keywords from this user, then think that this one or more keyword is abnormal keyword, do not process it.
Fig. 6 show the present invention again a preferred embodiment in the network device for determining the method flow diagram of crucial sub-word weight.
In step sl, the network equipment obtains the long-tail keyword from user.This step is described in detail in reference to the embodiment described in Fig. 1, comprises by reference, repeat no more at this.
Then, in step S31, the network equipment carries out semantic analysis to described long-tail keyword, to obtain the sub-word of described multiple key.
Such as, semantic analysis is carried out for long-tail keyword " Beijing fresh flower express delivery ", obtains three sub-words of key " Beijing ", " fresh flower ", " express delivery ".
Then, in step s 32, the network equipment, according to the second pre-defined rule, obtains the initial weight of the sub-word of described multiple key.
Wherein, described second pre-defined rule comprises the initial weight usually determining the sub-word of described key according to following at least one Xiang Yin:
The distribution situation of-sub-the word of described multiple key in total keywords database;
Wherein, described total keywords database refers to the dictionary comprising all user's keywords, and described distribution situation includes but not limited to: 1) crucial sub-word appears at the number of times in described total keywords database; 2) density of crucial sub-word in described total keywords database;
Such as, for the sub-word of key " Beijing ", " fresh flower ", " express delivery ", " fresh flower ", " express delivery ", occurrence number is more, and initial weight is higher, and " Beijing " occurrence number is less, and initial weight is lower;
Again such as, for the sub-word of key " Beijing ", " fresh flower ", " express delivery ", occur that more intensive key sub-word initial weight is higher;
Whether the sub-word of-described multiple key is real;
Can judge whether the sub-word of described multiple key is entity, if entity, then initial weight is higher, if not entity, then initial weight is lower according to entity dictionary.
Finally, in step S2 ', the network equipment is according to described first pre-defined rule and adjust described initial weight based on described association keyword, to obtain the weight of the sub-word of described multiple key.
Concrete, the network equipment first according to described first pre-defined rule and based on described association keyword obtain one estimate weight, wherein, the method how obtaining this estimation weight with referring to figs. 1 through determining in the embodiment shown in Fig. 4 that the method for crucial sub-word weight is same or similar, comprise by reference at this, repeat no more.Subsequently, the network equipment, according to described estimation weight, adjusts described initial weight, to obtain the weight of the sub-word of multiple key.The method of adjustment includes but not limited to: 1) being averaged to described estimation weight and described initial weight calculates; 2) variance calculating is carried out to described estimation weight and described initial weight; 3) to computing etc. after described estimation weight and described initial weight weighting.It will be understood by those skilled in the art that the method obtaining the weight of the sub-word of multiple key according to described estimation weight and described initial weight is not limited to above-mentioned citing.
Preferably, the initial weight that the present invention also comprises when a sub-word of key is less than the first predetermined threshold, then keep the step (not shown) of the initial weight of this crucial sub-word.
In fact, determining the process of crucial sub-word weight, is also the process of the core word determined in long-tail keyword, for the sub-word of key of weight lower than the first predetermined threshold, can think that this crucial sub-word is not core word.
In the step S32 determining crucial sub-word initial weight, when according to entity dictionary, judge that crucial sub-word is not entity, such as, the sub-word of key " purchase " etc. in long-tail keyword " purchase mobile phone ", then directly can determine that the weight of this crucial sub-word is lower than the first predetermined threshold, then in subsequent steps, keep the initial weight of this crucial sub-word, without the need to processing this crucial sub-word, to save system resource again.
As a preferred embodiment of the present invention, the present invention also comprises the user related information according to described user, adjusts the step (not shown) of the weight of the sub-word of described multiple key.
Wherein, described user related information comprises following at least one item:
The attribute of-described user;
Wherein, the attribute of described user includes but not limited to: the industry at this user place, the characteristic of this user, user buy keyword number etc.
Such as, if the industry at this user place is fresh flower sales industry, then for the sub-word of key " fresh flower ", " express delivery ", the weight of " fresh flower " is improved;
The search effect preference of-described user's setting;
For the sub-word of different keys, often bring different search effect tendencies, the search effect of self tendency according to the search effect preference of user's setting, can be met the weight raising etc. of the sub-word of key of the search effect preference of user by the network equipment.
It should be appreciated by those skilled in the art that the method adjusting crucial sub-word weight according to user related information of the present invention is not limited to above-mentioned citing.
Fig. 7 is the network equipment infrastructure schematic diagram for determining crucial sub-word weight showing one aspect of the invention.In the present embodiment, the network equipment comprises the first acquisition device 1 and weight analysis device 2.
First acquisition device 1 obtains the long-tail keyword from user.Wherein, described long-tail keyword comprises the sub-word of multiple key, the sub-word of described key is the vocabulary that can be separated from described long-tail keyword, and, each the crucial sub-word belonging to same long-tail keyword can be associated, such as, in long-tail keyword " Beijing fresh flower express delivery ", " Beijing fresh flower ", " fresh flower express delivery " " Beijing ", " fresh flower ", " express delivery " are the sub-word of key that this long-tail keyword comprises.Particularly, the mode that the first acquisition device 1 obtains described long-tail keyword includes but not limited to:
1) the first acquisition device 1 obtains long-tail keyword from the keywords database of this user;
First acquisition device 1 is searched in the keywords database of this user, whether semantically comprising multiple word, judging and obtaining long-tail keyword according to searching the keyword obtained; Or the network equipment is classified to keyword, the first acquisition device 1 directly therefrom can obtain long-tail keyword in sorted keywords database;
Wherein, the subscriber equipment that user can connect with the described network equipment by any one, input described long-tail keyword, to make the described network equipment be recorded in the keywords database of this user by described keyword, described subscriber equipment includes but not limited to computing machine, smart mobile phone, PDA or IPTV etc.; Or described long-tail keyword can directly be directly inputted in the keywords database of this user by network-side operating personnel;
2) the first acquisition device 1 directly obtains the long-tail keyword that user is inputted by subscriber equipment;
The long-tail keyword of Water demand by installing client on a user device, or by browser interconnection network equipment, is supplied to the network equipment by user.
Weight analysis device 2 is according to the first pre-defined rule and determine the weight of the sub-word of multiple keys that described long-tail keyword comprises based on association keyword.Wherein, described association keyword is associated with at least one in the sub-word of described multiple key, and described association keyword can be only semantically comprising the keyword of a word, also can be other long-tail keywords.When at least one in the sub-word of multiple keys that described long-tail keyword comprises and described same or similar at the keyword semantically comprising a word, can think that this keyword is associated with described long-tail keyword; When at least one in the sub-word of multiple keys that described long-tail keyword comprises appears in other long-tail keywords described or time similar at least one crucial sub-word that these other long-tail keywords comprise, can think that this keyword is associated with described long-tail keyword.
Weight analysis device 2 according to the first pre-defined rule and association keyword, determine in the process of the weight of the sub-word of described multiple key, with reference to following at least one factor:
-sub-the word of described multiple key or its approximate word appear at the number of times in described association keyword respectively;
Particularly, weight analysis device 2 searches association keyword in the keywords database of this user and/or in the keywords database of other users, and record the sub-word of described multiple key and appear at number of times in described association keyword, namely the number associating keyword be associated with the sub-word of described multiple key is respectively recorded, to record the number of times obtained more, then the weight of this crucial sub-word is higher;
The sub-word of-described multiple key and the relative described semantic similarity associating keyword;
Particularly, weight analysis device 2 searches association keyword in the keywords database of this user and/or in the keywords database of other users, and for the sub-word of each key, each analyzing that this crucial sub-word and this crucial sub-word be associated respectively associates similarity between keyword, such as, when this crucial sub-word is identical with a crucial keyword, then give the similarity evaluation of the first estate, when this crucial sub-word with one to associate keyword similar semantically, then give the similarity evaluation of the second grade, when this crucial sub-word is same or similar with a part associated in keyword, then give the evaluation etc. of the tertiary gradient, then, this crucial sub-word of comprehensive analysis associates the similarity between keyword with each, such as, described similarity is averaged and calculates, obtain weight.
In the present invention, also can comprehensive above-mentioned two parameters, obtain weight.Such as, after obtaining weight according to the sub-word of described multiple key or its approximate word number of times appeared at respectively in described association keyword, more described weight is adjusted according to the sub-word of described multiple key with the relative described semantic similarity associating keyword.
It should be noted that, those skilled in the art should understand that, above-mentioned citing is only and technical scheme of the present invention is described better, but not to the restriction that the present invention does, any according to described first pre-defined rule and association keyword, determine the scheme of the weight of the sub-word of described multiple key, within the scope of the present invention, and all should be contained in this by reference.
Fig. 8 shows the network equipment infrastructure schematic diagram for determining crucial sub-word weight of a preferred embodiment of the present invention.In the present embodiment, the network equipment comprises the first acquisition device 1 and weight analysis device 2, and wherein, weight analysis device 2 comprises the second acquisition device 211 and the first sub-analytical equipment 212.
First acquisition device 1 obtains the long-tail keyword from user.Because the first acquisition device 1 is described in detail in reference to the embodiment described in Fig. 6, therefore, comprise by reference at this, repeat no more.
Second acquisition device 211, based on described long-tail keyword, obtains described association keyword from from other keywords of this user.
Particularly, second acquisition device 211 mates in other keywords of this user, when judging all or part of content of this long-tail keyword, can match with all or part of content of other keywords, then think that these other keywords of long-tail keyword and this are associated.
Such as, described long-tail keyword is " fresh flower express delivery ", and in the keyword of this user, also comprise " fresh flower ", " Beijing fresh flower express delivery ", " Beijing fresh flower ", then the second acquisition device 211 is in the process of " fresh flower express delivery " being mated with " Beijing fresh flower speed ", judge that the full content " fresh flower express delivery " of this long-tail keyword matches with the partial content " fresh flower " in " Beijing fresh flower speed ", " Beijing fresh flower speed " is the association keyword of " fresh flower express delivery ", in the process that " fresh flower express delivery " is mated with " fresh flower ", judge that the partial content in this long-tail keyword " fresh flower express delivery " matches with the full content of " fresh flower ", " fresh flower " is the association keyword of " fresh flower express delivery ", in the process that " fresh flower express delivery " is mated with " Beijing fresh flower ", judge that the partial content " fresh flower " of this long-tail keyword " fresh flower express delivery " matches with the partial content " fresh flower " in " Beijing fresh flower ", " Beijing fresh flower speed " is the association keyword of " fresh flower express delivery ".
Or, second acquisition device 211 is by long-tail keyword described in semantic analysis, obtain the sub-word of multiple keys that described long-tail keyword comprises, and judge that wherein one or more crucial sub-words can match with all or part of content of other keywords from this user, then think that these other keywords of long-tail keyword and this are associated.
Such as, analyze " fresh flower express delivery ", obtain " fresh flower ", " express delivery " two sub-words of key, and by judging whether two sub-words of key match with from all or part of content in other keywords of this user, such as " fresh flower " matches with the full content in " fresh flower ", " express delivery " matches with the partial content in " Beijing fresh flower express delivery ", obtains associating keyword.
It should be noted that, although above-mentioned example all for be by identical word as matching, it should be appreciated by those skilled in the art that similar word, as " fresh flower " and " flower ", also can think and match.
First sub-analytical equipment 212 is according to the first pre-defined rule and determine the weight of the sub-word of described multiple key based on the association keyword from this user.
First sub-analytical equipment 212 according to the first pre-defined rule and the association keyword from this user, is determined in the process of the weight of the sub-word of described multiple key, with reference to following at least one factor:
-sub-the word of described multiple key or its approximate word appear at the number of times in described association keyword respectively;
Particularly, first sub-analytical equipment 212 also records the sub-word of described multiple key and appears at number of times in the described association keyword obtained, namely the number associating keyword be associated with the sub-word of described multiple key is respectively recorded, to record the number of times obtained more, then the weight of this crucial sub-word is higher;
Such as, crucial sub-word " fresh flower " appears in " fresh flower ", " Beijing fresh flower express delivery ", " Beijing fresh flower " three associating key word, and in " express delivery " turnover " Beijing fresh flower express delivery " now association keyword, then the weight of " fresh flower " is higher, and the weight of " express delivery " is lower;
The sub-word of-described multiple key and the described semantic similarity associating keyword;
Particularly, for the sub-word of each key, first sub-analytical equipment 212 analyze that this crucial sub-word and this crucial sub-word be associated respectively each associate similarity between keyword, such as, when this crucial sub-word is identical with a crucial keyword, then give the similarity evaluation of the first estate, when this crucial sub-word with one to associate keyword similar semantically, then give the similarity evaluation of the tertiary gradient, when this crucial sub-word is same or similar with a partial content associated in keyword, then give the evaluation etc. of the second grade, then, this crucial sub-word of comprehensive analysis associates the similarity between keyword with each, such as, described similarity is added up and process etc., obtain weight,
Such as, crucial sub-word " fresh flower " is identical with associating keyword " fresh flower ", give the similarity evaluation of the first estate, " fresh flower " with associate keyword " flower " semantic similitude, give the similarity evaluation of the tertiary gradient, " fresh flower " is identical with the partial content in " Beijing fresh flower express delivery ", give the evaluation of the second grade, finally, the similarity of " fresh flower " gained is carried out statistical treatment, determine that the weight of its comprehensive evaluation gained is the second grade etc.;
Again such as, crucial sub-word " express delivery " is only identical with the partial content in " Beijing fresh flower express delivery ", then directly give the evaluation of evaluation as " express delivery " of the second grade.
In the present invention, also can comprehensive above-mentioned two parameters, obtain weight.Particularly, after obtaining weight according to the sub-word of described multiple key or its approximate word number of times appeared at respectively in described association keyword, more described weight is adjusted according to the sub-word of described multiple key with the described semantic similarity associating keyword.Such as, it is three times according to the number of times that " fresh flower " and approximate word thereof occur in described association keyword, the occurrence number of " express delivery " is once, obtain " fresh flower " weight higher, " express delivery " weight is lower, and because of " fresh flower " identical according to the grade of similarity evaluation gained with " express delivery " according to the grade of similarity evaluation gained, then do not adjust both weight ratios.
It should be noted that, second acquisition device 211 and the first sub-analytical equipment 212 can operate simultaneously, second acquisition device 211 often obtains an association keyword, and the first sub-analytical equipment 212 can correspondingly be set up according to this association keyword or adjust the weight of crucial sub-word.
What needs further illustrated is, those skilled in the art should understand that, above-mentioned citing is only and technical scheme of the present invention is described better, but not to the restriction that the present invention does, any according to described first pre-defined rule and association keyword, determine the scheme of the weight of the sub-word of described multiple key, within the scope of the present invention, and all should be contained in this by reference.
Fig. 9 shows the network equipment infrastructure schematic diagram for determining crucial sub-word weight of another preferred embodiment of the present invention.In the present embodiment, the network equipment comprises the first acquisition device 1 and weight analysis device 2, and wherein, weight analysis device 2 comprises the 3rd acquisition device 221 and the second sub-analytical equipment 222.
First acquisition device 1 obtains the long-tail keyword from user.Because the first acquisition device 1 is described in detail in reference to the embodiment described in Fig. 6, therefore, comprise by reference at this, repeat no more.
3rd acquisition device 221 based on described long-tail keyword, from from obtaining described association keyword the keyword of other users.The difference of the 3rd acquisition device 221 and the second acquisition device 211 is, 3rd acquisition device 221 obtains association keyword from the keyword of other users but not in the keyword of this user, but the 3rd acquisition device 221 obtains the method for described association keyword and the second acquisition device 211 is same or similar, therefore, be contained in this by reference, repeat no more.
Second sub-analytical equipment 222 is according to described first pre-defined rule and determine the weight of the sub-word of described multiple key based on the association keyword from other users.The difference of the second sub-analytical equipment 222 and the first sub-analytical equipment S212 is, described association keyword is from the keyword of other users but not from the keyword of this user, but the second sub-analytical equipment 222 determine the process of weight and the first sub-analytical equipment S212 same or similar, therefore, be contained in this by reference, repeat no more.
It should be noted that, 3rd acquisition device 221 and the second sub-analytical equipment 222 can operate simultaneously, 3rd acquisition device 221 often obtains an association keyword, and the second sub-analytical equipment 222 can correspondingly be set up according to this association keyword or adjust the weight of crucial sub-word.
Figure 10 shows the network equipment infrastructure schematic diagram for determining crucial sub-word weight of another preferred embodiment of the present invention.In the present embodiment, the network equipment comprises the first acquisition device 1 and weight analysis device 2, and wherein, weight analysis device 2 comprises the 4th acquisition device 231, first and searches device 232, first merging device 233 and the 3rd sub-analytical equipment 234.
First acquisition device 1 obtains the long-tail keyword from user.Because the first acquisition device 1 is described in detail in reference to the embodiment described in Fig. 6, therefore, comprise by reference at this, repeat no more.
4th acquisition device 231 is based on described long-tail keyword, one or more association keyword is obtained, to set up the first one or more incidence relation in described long-tail keyword and the sub-word of described key according to described one or more association keyword from from other keywords of this user.4th acquisition device 231 how to obtain the process of one or more association keyword and the second acquisition device 211 same or similar, therefore, be contained in this by reference, repeat no more.
4th acquisition device 231 is in the process obtaining described association keyword, or, obtain all after association keyword, according to described association keyword the sub-word of one or more keys of being correlated with, set up the incidence relation of first of described long-tail keyword and the sub-word of described one or more key.Wherein, described first incidence relation represents the long-tail keyword set up based on a user and the incidence relation can analyzing the sub-word of the key obtained in the keywords database of this user.
Particularly, when the 4th acquisition device 231 often gets an association keyword, namely with described long-tail keyword and this association keyword the sub-word of one or more keys of being correlated with for node, the correlativity of described long-tail keyword and the sub-word of this one or more key is limit, set up this long-tail keyword and associate incidence relation between the sub-word of the one or more key of keyword with this, until obtain all association keywords, complete the first incidence relation establishing described long-tail keyword and the sub-word of one or more key.
Such as, when having got the association keyword " fresh flower " of long-tail keyword " fresh flower express delivery " in other keywords this user of the 4th acquisition device 231, this association keyword the sub-word of key of being correlated be " fresh flower ", namely with " fresh flower express delivery " and " fresh flower " for node, both correlativitys are limit, set up the incidence relation of " fresh flower express delivery " and " fresh flower ", subsequently, 4th acquisition device 231 has got association keyword " fresh flower express delivery " again in other keywords of this user, this association keyword the sub-word of key of being correlated be " fresh flower ", because " fresh flower express delivery " incidence relation with " fresh flower " is set up, therefore, 4th acquisition device 231 no longer repeats the incidence relation of both foundation, subsequently, 4th acquisition device 231 does not continue to search association keyword in other keywords of this user, then the 4th acquisition device 231 judges that the first incidence relation that long-tail keyword " fresh flower express delivery " sets up based on this user is as " fresh flower express delivery " and the incidence relation of " fresh flower ".
Or, after obtaining the relevant keyword of institute, 4th acquisition device 231 with described long-tail keyword and described relevant keyword the sub-word of multiple keys of being correlated with for node, the correlativity of described long-tail keyword and the sub-word of described all multiple keys is limit, sets up the first incidence relation.
Such as, 4th acquisition device 231 is searched and is obtained association keyword " fresh flower " and " fresh flower express delivery " in other keywords of this user, sets up the incidence relation that long-tail keyword " fresh flower express delivery " is " fresh flower express delivery " and " fresh flower " based on the first incidence relation that this user sets up.
It should be noted that, the structure of above-mentioned incidence relation includes but not limited to: 1) tree; 2) link structure; 3) corresponding table etc.Those skilled in the art should understand that, the incidence relation of indication of the present invention is not limited with said structure, any scheme can setting up the incidence relation of described long-tail keyword and association keyword thereof all should within the scope of the present invention, and comprise by reference.
First searches device 232 searches based on described long-tail keyword and from the first one or more incidence relation in the described long-tail keyword based on these other users set up from the one or more association keyword obtained the keywords database of other users and the sub-word of described key.
Such as, for long-tail keyword " fresh flower express delivery ", first searches the first incidence relation that device 232 searches this long-tail keyword crucial sub-word with it obtained based on other two users, such as, the first incidence relation based on one of them user is " fresh flower express delivery " and " fresh flower ", and the first incidence relation based on another user is " fresh flower express delivery " and " express delivery ".
First merges device 233 merges based on the first one or more incidence relation in described long-tail keyword and the sub-word of described key of this user and based on the first one or more incidence relation in the described long-tail keyword of these other users and the sub-word of described key, to obtain the second one or more incidence relation in described long-tail keyword and the sub-word of described key.Wherein, described second incidence relation represents the incidence relation between this long-tail keyword that a long-tail keyword is set up based on multiple user and the crucial sub-word of one or more.
Such as, first merges device 233 by the first incidence relation based on " fresh flower " in described long-tail keyword " fresh flower express delivery " and the sub-word of described key of this user, merge with " express delivery " with " fresh flower ", " fresh flower express delivery " with two the first incidence relations " fresh flower express delivery " based on other two users, obtain the second incidence relation [fresh flower express delivery of long-tail keyword " fresh flower express delivery " and two crucial sub-word " fresh flower ", " express delivery ", (fresh flower, express delivery)].
3rd sub-analytical equipment 234, according to described first pre-defined rule, based on described second incidence relation, determines the weight of the sub-word of described multiple key.
Wherein, described first pre-defined rule usually determines the weight of the sub-word of described key according to following at least one Xiang Yin:
The number of times that-sub-the word of described multiple key or its approximate word occur;
Such as, for [the fresh flower express delivery of the second incidence relation, (fresh flower, express delivery)], in the process setting up described first incidence relation and the second incidence relation, be recorded to " fresh flower " and occurred twice, " express delivery " occurred once, then the weight of the weight of crucial sub-word " fresh flower " higher than the sub-word of key " express delivery " thought by the 3rd sub-analytical equipment 234;
-sub-the word of described multiple key and the semantic similarity with described long-tail keyword;
Wherein, by this crucial sub-word and described long-tail keyword, the degree of approach in the second incidence relation judges described semantic similarity, such as, for [the fresh flower express delivery of the second incidence relation, (fresh flower, express delivery)], because " fresh flower " and " express delivery " all directly connects with " fresh flower express delivery ", then fresh flower determined by the 3rd sub-analytical equipment 234 " and the weights of " express delivery " identical.
But, it should be appreciated by those skilled in the art that in the present invention, also can comprehensive above-mentioned two parameters, obtain weight.Particularly, after obtaining weight according to the number of times of the sub-word appearance of described multiple key, more described weight is adjusted according to the semantic similarity of the sub-word of described multiple key and described long-tail keyword.
It should be noted that, above-mentioned given example, be only and better the solution of the present invention be described, but not restriction made for the present invention, it should be appreciated by those skilled in the art that and anyly set up described second incidence relation according to the first incidence relation, and determine the scheme of crucial sub-word weight, all should within the scope of the present invention, and comprise by reference.
Preferably, the present embodiment also comprises second and searches device (not shown), second and merge device (not shown) and the 4th sub-analytical equipment.
Second searches device searches one or more second incidence relation of other long-tail keywords with it in crucial sub-word.
Second merges device according to one or more second incidence relation of described long-tail keyword with it in crucial sub-word and one or more second incidence relation of other long-tail keywords described with it in crucial sub-word, and be associated set of relations.
Particularly, second merges device by judging whether other long-tail keywords described searched comprise described long-tail keyword or whether identical with the sub-word of the key of this long-tail keyword, judge whether the second incidence relation of the second incidence relation of described long-tail keyword and other long-tail keywords described to be merged, if can, then merge, be associated set of relations.
Such as, when second searches device and search and obtain long-tail keyword " Beijing fresh flower express delivery ", then think that " Beijing fresh flower express delivery " comprises long-tail keyword " fresh flower express delivery ", " Beijing fresh flower speed " second incidence relation with " fresh flower express delivery " is merged, such as second incidence relation of " Beijing fresh flower speed " is [Beijing fresh flower express delivery, (Beijing fresh flower, fresh flower express delivery)], second incidence relation of " fresh flower express delivery " is [fresh flower express delivery, (fresh flower, express delivery)], then merge and obtain { Beijing fresh flower express delivery, [Beijing fresh flower, fresh flower express delivery (fresh flower, express delivery)] }, its tree figure as shown in Figure 5.
4th sub-analytical equipment, according to described first pre-defined rule, determines the weight of the sub-word of multiple keys that described long-tail keyword and other long-tail keywords comprise.
Wherein, described first pre-defined rule usually determines the weight of the sub-word of described key according to following at least one Xiang Yin:
The number of times that-sub-the word of described multiple key or its approximate word occur;
Such as, for { Beijing fresh flower express delivery of incidence relation collection, [Beijing fresh flower, fresh flower express delivery (fresh flower, express delivery)] }, in the process setting up described first incidence relation, the second incidence relation and described incidence relation collection, be recorded to " fresh flower express delivery " and occurred three times, " fresh flower " occurred twice, " express delivery " occurred once, " Beijing fresh flower " occurred once, then the 4th sub-analytical equipment determines that the weight order of crucial sub-word is for " fresh flower express delivery " > " fresh flower " > " express delivery "=" Beijing fresh flower ";
-sub-the word of described multiple key and the semantic similarity with long-tail keyword;
Wherein, by this crucial sub-word and described long-tail keyword, the degree of approach in the second incidence relation judges described semantic similarity, such as, for { Beijing fresh flower express delivery of incidence relation collection, [Beijing fresh flower, fresh flower express delivery (fresh flower, express delivery)] }, pass through " fresh flower speed " to be connected with " Beijing fresh flower express delivery " due to " fresh flower ", " Beijing fresh flower " is directly connected with " Beijing fresh flower express delivery ", then the 4th sub-analytical equipment determines that the weights of " fresh flower " are identical lower than the weights of " Beijing express delivery ".
But, it should be appreciated by those skilled in the art that in the present invention, also can comprehensive above-mentioned two parameters, obtain weight.Particularly, after obtaining weight according to the number of times of the sub-word appearance of described multiple key, more described weight is adjusted according to the semantic similarity of the sub-word of described multiple key and described long-tail keyword.
Need to further illustrate, if multiple incidence relation is concentrated have the sub-word of annexable key or long-tail keyword, then multiple incidence relation collection can merge further.
Preferably, when one or more keywords of this user cannot cannot be set up the first incidence relation with other keywords from this user by the 4th acquisition device 231, then think that this one or more keyword is abnormal keyword, it is not processed.
Figure 11 shows the network equipment infrastructure schematic diagram for determining crucial sub-word weight of the present invention's preferred embodiment again.In the present embodiment, the network equipment comprises the first acquisition device 1, semantic analysis device 31, initial weight analytical equipment 32 and weight analysis device 2.
First acquisition device 1 obtains the long-tail keyword from user.Because the first acquisition device 1 is described in detail in reference to the embodiment described in Fig. 6, therefore, comprise by reference at this, repeat no more.
Semantic analysis device 31 carries out semantic analysis to described long-tail keyword, to obtain the sub-word of described multiple key.
Such as, semantic analysis device 31 carries out semantic analysis for long-tail keyword " Beijing fresh flower express delivery ", obtains three sub-words of key " Beijing ", " fresh flower ", " express delivery ".
Initial weight analytical equipment 32, according to the second pre-defined rule, obtains the initial weight of the sub-word of described multiple key.
Wherein, described second pre-defined rule comprises the initial weight usually determining the sub-word of described key according to following at least one Xiang Yin:
The distribution situation of-sub-the word of described multiple key in total keywords database;
Wherein, described total keywords database refers to the dictionary comprising all user's keywords, and described distribution situation includes but not limited to: 1) crucial sub-word appears at the number of times in described total keywords database; 2) density of crucial sub-word in described total keywords database;
Such as, for the sub-word of key " Beijing ", " fresh flower ", " express delivery ", initial weight analytical equipment 32 determines " fresh flower ", " express delivery ", and occurrence number is more, and initial weight is higher, and " Beijing " occurrence number is less, and initial weight is lower;
Again such as, for the sub-word of key " Beijing ", " fresh flower ", " express delivery ", initial weight analytical equipment 32 is determined to occur that more intensive key sub-word initial weight is higher;
Whether the sub-word of-described multiple key is real;
According to entity dictionary, initial weight analytical equipment 32 can judge whether the sub-word of described multiple key is entity, if entity, then initial weight is higher, if not entity, then initial weight is lower.
Weight analysis device 2 also adjusts described initial weight based on described association keyword, to obtain the weight of the sub-word of described multiple key according to described first pre-defined rule.
Concrete, weight analysis device 2 first according to described first pre-defined rule and based on described association keyword obtain one estimate weight, wherein, with with reference in the embodiment shown in Fig. 6 to Fig. 9, the process obtaining this estimation weight determines that the process of crucial sub-word weight is same or similar, comprise by reference at this, repeat no more.Subsequently, weight analysis device 2, according to described estimation weight, adjusts described initial weight, to obtain the weight of the sub-word of multiple key.The method that weight analysis device 2 pairs of initial weights adjust includes but not limited to: 1) being averaged to described estimation weight and described initial weight calculates; 2) variance calculating is carried out to described estimation weight and described initial weight; 3) to computing etc. after described estimation weight and described initial weight weighting.It will be understood by those skilled in the art that the method obtaining the weight of the sub-word of multiple key according to described estimation weight and described initial weight is not limited to above-mentioned citing.
Preferably, when the initial weight of a sub-word of key is less than the first predetermined threshold, weight analysis device 2 keeps the initial weight of this crucial sub-word.
In fact, determining the process of crucial sub-word weight, is also the process of the core word determined in long-tail keyword, for the sub-word of key of weight lower than the first predetermined threshold, can think that this crucial sub-word is not core word.
In the process determining crucial sub-word initial weight, when initial weight determining device 32 is according to entity dictionary, judge that crucial sub-word is not entity, such as, the sub-word of key " purchase " etc. in long-tail keyword " purchase mobile phone ", then directly can determine that the weight of this crucial sub-word is lower than the first predetermined threshold, then weight analysis device 2 is in follow-up processing procedure, keeps the initial weight of this crucial sub-word, without the need to processing this crucial sub-word, to save system resource again.
As a preferred embodiment of the present invention, weight analysis device 2, also for the user related information according to described user, adjusts the weight of the sub-word of described multiple key.
Wherein, described user related information comprises following at least one item:
The attribute of-described user;
Wherein, the attribute of described user includes but not limited to: the industry at this user place, the characteristic of this user, user buy keyword number etc.
Such as, if the industry at this user place is fresh flower sales industry, then for the sub-word of key " fresh flower ", " express delivery ", the weight of " fresh flower " improves by weight analysis device 2;
The search effect preference of-described user's setting;
For the sub-word of different keys, often bring different search effect tendencies, the search effect of self tendency according to the search effect preference of user's setting, can be met the weight raising etc. of the sub-word of key of the search effect preference of user by weight analysis device 2.
It should be appreciated by those skilled in the art that the method adjusting crucial sub-word weight according to user related information of the present invention is not limited to above-mentioned citing.
To those skilled in the art, obviously the invention is not restricted to the details of above-mentioned one exemplary embodiment, and when not deviating from spirit of the present invention or essential characteristic, the present invention can be realized in other specific forms.Therefore, no matter from which point, all should embodiment be regarded as exemplary, and be nonrestrictive, scope of the present invention is limited by claims instead of above-mentioned explanation, and all changes be therefore intended in the implication of the equivalency by dropping on claim and scope are included in the present invention.Any Reference numeral in claim should be considered as the claim involved by limiting.In addition, obviously " comprising " one word do not get rid of other unit or step, odd number does not get rid of plural number.Multiple unit of stating in system claims or device also can be realized by software or hardware by a unit or device.First, second word such as grade is used for representing title, and does not represent any specific order.

Claims (20)

1., in the network device for determining a method for crucial sub-word weight, wherein, the method comprises the following steps:
A obtains the long-tail keyword from user;
B is according to the first pre-defined rule and determine the weight of the sub-word of each key in the sub-word of multiple keys that described long-tail keyword comprises based on association keyword, wherein, described association keyword comprises the association keyword from this user and/or other users, and described association keyword is associated with at least one in the sub-word of described multiple key;
Wherein, described first pre-defined rule usually determines the weight of the sub-word of described key according to following at least one Xiang Yin:
The number of times that-sub-the word of described multiple key or its approximate word occur in described association keyword, wherein, number of times is more, and the weight of the sub-word of described key is higher;
The sub-word of-described multiple key and the described semantic similarity associating keyword or described long-tail keyword associated with it, wherein, by this crucial sub-word and described long-tail keyword, the degree of approach in the second incidence relation judges the semantic similarity of crucial sub-word and described long-tail keyword;
Wherein, described step b comprises the following steps for determining described second incidence relation:
Based on described long-tail keyword, one or more association keyword is obtained, to set up based on the first one or more incidence relation in the described long-tail keyword of this user and the sub-word of described key according to described one or more association keyword from from other keywords of this user;
Search based on described long-tail keyword and from the first one or more incidence relation in the described long-tail keyword based on these other users set up from the one or more association keyword obtained the keywords database of other users and the sub-word of described key;
Merge based on the first one or more incidence relation in the described long-tail keyword of this user and the sub-word of described key, to obtain the second one or more incidence relation in described long-tail keyword and the sub-word of described key with based on the first one or more incidence relation in the described long-tail keyword of these other users and the sub-word of described key.
2. method according to claim 1, wherein, described association keyword comprises the association keyword from this user, and wherein, described step b is further comprising the steps of:
-based on described long-tail keyword, from from obtaining one or more described association keyword other keywords of this user;
-determine the weight of the sub-word of described multiple key based on the described one or more association keywords from this user according to the first pre-defined rule.
3. method according to claim 1, wherein, described association keyword comprises the association keyword from other users, and wherein, described step b is further comprising the steps of:
-based on described long-tail keyword, from from obtaining one or more described association keyword the keyword of other users;
-determine the weight of the sub-word of described multiple key based on the described one or more association keywords from other users according to described first pre-defined rule.
4. method according to claim 1, wherein, the method is further comprising the steps of:
-search one or more second incidence relation of other long-tail keywords with it in crucial sub-word;
-according to one or more second incidence relation of described long-tail keyword with it in crucial sub-word and one or more second incidence relation of other long-tail keywords described with it in crucial sub-word, be associated set of relations;
-according to described first pre-defined rule, determine the weight of the sub-word of multiple keys that described long-tail keyword and other long-tail keywords comprise.
5. method according to claim 1, wherein, the method is further comprising the steps of:
-semantic analysis is carried out to described long-tail keyword, to obtain the sub-word of described multiple key;
-according to the second pre-defined rule, obtain the initial weight of the sub-word of described multiple key;
Wherein, described step b is further comprising the steps of:
-adjust described initial weight based on described association keyword, to obtain the weight of the sub-word of described multiple key according to described first pre-defined rule.
6. method according to claim 5, wherein, described second pre-defined rule comprises the initial weight usually determining the sub-word of described key according to following at least one Xiang Yin:
The distribution situation of-sub-the word of described multiple key in total keywords database;
Whether the sub-word of-described multiple key is entity.
7. method according to claim 5, wherein, described step b is further comprising the steps of:
-when the initial weight of a sub-word of key is less than the first predetermined threshold, then keep the initial weight of this crucial sub-word.
8. method according to claim 1, wherein, the step of the crucial sub-word weight of the adjustment in described step b is further comprising the steps of:
-according to the user related information of described user, adjust the weight of the sub-word of described multiple key.
9. method according to claim 8, wherein, described user related information comprises following at least one item:
The attribute of-described user;
The search effect preference of-described user's setting.
10. method according to any one of claim 1 to 9, wherein, the described network equipment comprises: the webserver group of single network server, multiple webserver composition or the cloud of computing machine collection composition.
11. 1 kinds for determining the network equipment of crucial sub-word weight, wherein, this network equipment comprises:
First acquisition device, for obtaining the long-tail keyword from user;
Weight analysis device, for determining the weight of the sub-word of each key in the sub-word of multiple keys that described long-tail keyword comprises according to the first pre-defined rule based on association keyword, wherein, described association keyword comprises the association keyword from this user and/or other users, and described association keyword is associated with at least one in the sub-word of described multiple key;
Wherein, described first pre-defined rule usually determines the weight of the sub-word of described key according to following at least one Xiang Yin:
The number of times that-sub-the word of described multiple key or its approximate word occur in described association keyword, wherein, number of times is more, and the weight of the sub-word of described key is higher;
The sub-word of-described multiple key and the described semantic similarity associating keyword or described long-tail keyword associated with it, wherein, by this crucial sub-word and described long-tail keyword, the degree of approach in the second incidence relation judges the semantic similarity of crucial sub-word and described long-tail keyword;
Wherein, described weight analysis device comprise for determine described second incidence relation with lower device:
4th acquisition device, for based on described long-tail keyword, one or more association keyword is obtained, to set up based on the first one or more incidence relation in the described long-tail keyword of this user and the sub-word of described key according to described one or more association keyword from other keywords of this user;
First searches device, for searching based on described long-tail keyword and from the first one or more incidence relation in the described long-tail keyword based on these other users set up from the one or more association keyword obtained the keywords database of other users and the sub-word of described key;
First merges device, for merging based on the first one or more incidence relation in the described long-tail keyword of this user and the sub-word of described key, to obtain the second one or more incidence relation in described long-tail keyword and the sub-word of described key with based on the first one or more incidence relation in the described long-tail keyword of these other users and the sub-word of described key.
12. network equipments according to claim 11, wherein, described association keyword comprises the association keyword from this user, and wherein, described weight analysis device also comprises:
Second acquisition device, based on described long-tail keyword, from from obtaining one or more described association keyword other keywords of this user;
First sub-analytical equipment, for determining the weight of the sub-word of described multiple key according to the first pre-defined rule based on the described one or more association keywords from described user.
13. network equipments according to claim 11, wherein, described association keyword comprises the association keyword from other users, and wherein, described weight analysis device also comprises:
3rd acquisition device, for based on described long-tail keyword, from from obtaining one or more described association keyword the keyword of other users;
Second sub-analytical equipment, for determining the weight of the sub-word of described multiple key according to described first pre-defined rule based on the described one or more association keywords from other users.
14. network equipments according to claim 11, wherein, this network equipment also comprises:
-the second searches device, for searching one or more second incidence relation of other long-tail keywords with it in crucial sub-word;
-the second merges device, and for according to one or more second incidence relation of described long-tail keyword with it in crucial sub-word and one or more second incidence relation of other long-tail keywords described with it in crucial sub-word, be associated set of relations;
The sub-analytical equipment of-four, for according to described first pre-defined rule, determines the weight of the sub-word of multiple keys that described long-tail keyword and other long-tail keywords comprise.
15. network equipments according to claim 11, wherein, this network equipment also comprises:
Semantic analysis device, for carrying out semantic analysis to described long-tail keyword, to obtain the sub-word of described multiple key;
Initial weight analytical equipment, for according to the second pre-defined rule, obtains the initial weight of the sub-word of described multiple key;
Wherein, described weight analysis device also for:
Based on described association keyword, described initial weight is adjusted, to obtain the weight of the sub-word of described multiple key according to described first pre-defined rule.
16. network equipments according to claim 15, wherein, described second pre-defined rule comprises the initial weight usually determining the sub-word of described key according to following at least one Xiang Yin:
The distribution situation of-sub-the word of described multiple key in total keywords database;
Whether the sub-word of-described multiple key is entity.
17. network equipments according to claim 15, wherein, described weight analysis device also for:
When the initial weight of a sub-word of key is less than the first predetermined threshold, then keep the initial weight of this crucial sub-word.
18. network equipments according to claim 11, wherein, described weight analysis device also for:
According to the user related information of described user, adjust the weight of the sub-word of described multiple key.
19. network equipments according to claim 18, wherein, described user related information comprises following at least one item:
The attribute of-described user;
The search effect preference of-described user's setting.
20. according to claim 11 to the network equipment according to any one of 19, and wherein, the described network equipment comprises: the webserver group of single network server, multiple webserver composition or the cloud of computing machine collection composition.
CN201010501398.9A 2010-10-09 2010-10-09 A kind of in the network device for determining the method and apparatus of crucial sub-word weight Active CN102446174B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201010501398.9A CN102446174B (en) 2010-10-09 2010-10-09 A kind of in the network device for determining the method and apparatus of crucial sub-word weight

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201010501398.9A CN102446174B (en) 2010-10-09 2010-10-09 A kind of in the network device for determining the method and apparatus of crucial sub-word weight

Publications (2)

Publication Number Publication Date
CN102446174A CN102446174A (en) 2012-05-09
CN102446174B true CN102446174B (en) 2015-11-25

Family

ID=46008678

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201010501398.9A Active CN102446174B (en) 2010-10-09 2010-10-09 A kind of in the network device for determining the method and apparatus of crucial sub-word weight

Country Status (1)

Country Link
CN (1) CN102446174B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104077327B (en) * 2013-03-29 2018-01-19 阿里巴巴集团控股有限公司 The recognition methods of core word importance and equipment and search result ordering method and equipment
CN104021214A (en) * 2014-06-20 2014-09-03 北京奇虎科技有限公司 Long tail keyword-based search recommending method and device
CN109344386B (en) * 2018-07-27 2023-04-25 蚂蚁金服(杭州)网络技术有限公司 Text content identification method, apparatus, device and computer readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101216842A (en) * 2008-01-07 2008-07-09 华为技术有限公司 Method for obtaining page key words and page information processing apparatus
CN101315624A (en) * 2007-05-29 2008-12-03 阿里巴巴集团控股有限公司 Text subject recommending method and device
CN101853250A (en) * 2009-04-03 2010-10-06 华为技术有限公司 Method and device for classifying documents

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101315624A (en) * 2007-05-29 2008-12-03 阿里巴巴集团控股有限公司 Text subject recommending method and device
CN101216842A (en) * 2008-01-07 2008-07-09 华为技术有限公司 Method for obtaining page key words and page information processing apparatus
CN101853250A (en) * 2009-04-03 2010-10-06 华为技术有限公司 Method and device for classifying documents

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《基于联合权重的多文档关键词抽取技术》;杨洁;《中文信息学报》;20081130;第22卷(第6期);第75-80页 *

Also Published As

Publication number Publication date
CN102446174A (en) 2012-05-09

Similar Documents

Publication Publication Date Title
KR101700585B1 (en) On-line product search method and system
CN107609152B (en) Method and apparatus for expanding query expressions
TWI640878B (en) Query word fusion method, product information publishing method, search method and system
JP5778255B2 (en) Method, system, and apparatus for query based on vertical search
US9473587B2 (en) Relevance-based aggregated social feeds
US7949643B2 (en) Method and apparatus for rating user generated content in search results
Qian et al. Social media based event summarization by user–text–image co-clustering
CN107180093B (en) Information searching method and device and timeliness query word identification method and device
CN101984420B (en) Method and equipment for searching pictures based on word segmentation processing
CN101957834A (en) Content recommending method and device based on user characteristics
JP6428795B2 (en) Model generation method, word weighting method, model generation device, word weighting device, device, computer program, and computer storage medium
JP2013531289A (en) Use of model information group in search
CN105389590B (en) Video clustering recommendation method and device
CN103324645A (en) Method and device for recommending webpage
CN103279504B (en) A kind of searching method and device based on ambiguity resolution
US10127322B2 (en) Efficient retrieval of fresh internet content
CN111651678A (en) Knowledge graph-based personalized recommendation method
WO2017143703A1 (en) Offline resource mining method and device
US9424338B2 (en) Clustering queries for image search
CN102446174B (en) A kind of in the network device for determining the method and apparatus of crucial sub-word weight
JP6434954B2 (en) Information processing apparatus, information processing method, and program
KR101091991B1 (en) Apparatus and method for providing advertisement
CN102314422A (en) Method and equipment for preferably selecting open type interactive forum based on user interests
CN103312584A (en) Method and apparatus for releasing information in network community
CN111222918B (en) Keyword mining method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant