CN103136213A - Method and device for providing related words - Google Patents

Method and device for providing related words Download PDF

Info

Publication number
CN103136213A
CN103136213A CN2011103768404A CN201110376840A CN103136213A CN 103136213 A CN103136213 A CN 103136213A CN 2011103768404 A CN2011103768404 A CN 2011103768404A CN 201110376840 A CN201110376840 A CN 201110376840A CN 103136213 A CN103136213 A CN 103136213A
Authority
CN
China
Prior art keywords
keyword
related term
search results
training sample
search
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2011103768404A
Other languages
Chinese (zh)
Other versions
CN103136213B (en
Inventor
钟灵
周祥军
申月
杨洁
蒋龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201110376840.4A priority Critical patent/CN103136213B/en
Publication of CN103136213A publication Critical patent/CN103136213A/en
Application granted granted Critical
Publication of CN103136213B publication Critical patent/CN103136213B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a method and a device for providing related words and aims at solving the problem that provided related words in the prior art are not accurate enough. The method for providing the related words comprises the steps of in allusion to alternative related words of key words input by a user, inputting a feature fraction of the key words and the alternative related words on each preset feature into a relevance fraction calculation model, obtaining relevance fractions of the key words and the alternative related words, and providing the related words. The relevance fraction calculation model is determined according to the key words and the related words, wherein the number of the key words and the number of the related words are preset, and the relevance fractions of the key words and the related words are worked out. Through the method, even if the key words input by the user are not recorded in a search log, the feature fractures of the key words and the alternative related words can be input into the relevance fraction calculation model to obtain the relevance fractions of the key words and the alternative related words. Therefore, accurate related words are provided for the user, the user does not need to search again, and server resources are saved.

Description

A kind of method and device that related term is provided
Technical field
The application relates to communication technical field, relates in particular to a kind of method and device that related term is provided.
Background technology
At present, the server of a lot of shopping websites all provides the function of commercial articles searching, the keyword of the commodity searched for is wanted in user input, and server is result corresponding according to this keyword search and return to the user, also namely returns to the user to the merchandise news that searches.
Because the keyword of user's input is often lack of standardization, only may search for the actual Search Results of wanting less than the user with this keyword search, therefore, for Search Results accurately is provided, the general searching method that adopts of server is that the keyword that the user is inputted carries out the normalization operation, makes normalization keyword afterwards standard more, use normalization keyword afterwards to search for, and Search Results is provided.
Wherein, in order to reach better search effect, server generally also can be searched each related term corresponding to this keyword according to the keyword of user's input, and each related term that finds is offered the user.Employing provides the method for related term, and the user just can directly click certain related term when not obtaining its satisfied Search Results, use this related term to search for, and can further improve the accuracy of search like this.
In the prior art, server related term is provided method as shown in Figure 1.Fig. 1 specifically comprises the following steps for the process of related term is provided in prior art:
S101: adopt minute word algorithm of setting, the keyword that the user is inputted is split as several participles.
S102: according to the attribute of each participle, in each related term of preserving, determine each alternative related term of this keyword.
S103: each clicked first Search Results in passing through of searching that search records in the daily record Search Results that this keyword search obtains, and the number of times of clicking.
S104: for each alternative related term of determining, search in passing through of recording in the search daily record Search Results that this alternative related term search obtains clicked each second and searched result, and the number of times of clicking.
S105: determine each identical Search Results in each first Search Results and each second Search Results, clicked number of times according to the identical corresponding keyword of each Search Results, and to clicked number of times that should alternative related term, calculate the relevance scores of this keyword and this alternative related term.
S106: according to this keyword of determining and the relevance scores of each alternative related term, the higher alternative related term of selection relevance scores offers the user as the related term of this keyword.
yet, adopt when providing the method for related term in prior art, if the keyword of user's input is new keyword, it is not recorded in the search daily record, perhaps, if in search finds each first Search Results corresponding to this keyword and each second Search Results corresponding to each alternative related term in daily record, identical Search Results is not arranged, can not calculate the relevance scores of this keyword and each alternative related term, therefore the related term that provides is not accurate enough, causing the user need to input other similar related terms searches for again, consumed a large amount of server resources.
Summary of the invention
The embodiment of the present application provides a kind of method and device that related term is provided, and is not accurate enough in order to the related term that provides in prior art to be provided, and causes the user need to input other similar related terms and again searches for, and consumed the problem of a large amount of server resources.
A kind of method that related term is provided that the embodiment of the present application provides comprises:
According to the keyword of user's input, determine each alternative related term of described keyword;
For each alternative related term of determining, determine described keyword and the feature scores of this alternative related term on each feature of setting, with each feature scores of determining as input parameter value input correlation mark computation model, obtain the relevance scores of this keyword and this alternative related term, wherein, described relevance scores computation model is to determine according to the keyword that calculates relevance scores of setting quantity and related term;
According to the described keyword that obtains and the relevance scores of each alternative related term, select to offer described user's related term in each alternative related term.
A kind of device that related term is provided that the embodiment of the present application provides comprises:
Alternative related term determination module is used for the keyword according to user's input, determines each alternative related term of described keyword;
The relevance scores determination module, be used for for each alternative related term of determining, determine described keyword and the feature scores of this alternative related term on each feature of setting, with each feature scores of determining as input parameter value input correlation mark computation model, obtain the relevance scores of this keyword and this alternative related term, wherein, described relevance scores computation model is to determine according to the keyword that calculates relevance scores of setting quantity and related term;
Related term provides module, is used for selecting to offer described user's related term according to the described keyword that obtains and the relevance scores of each alternative related term in each alternative related term.
The embodiment of the present application provides a kind of method and device that related term is provided, the method is for each alternative related term of the keyword of user's input, with this keyword and the feature scores input correlation mark computation model of this alternative related term on each feature of setting, obtain the relevance scores of this keyword and this alternative related term, and provide accordingly related term, wherein, this relevance scores computation model is to determine according to the keyword that calculates relevance scores of setting quantity and related term.Pass through said method, even the keyword of user's input is not recorded in the search daily record, also can be by the feature scores input correlation mark computation model with this keyword and each alternative related term, obtain the relevance scores of this keyword and each alternative related term, thereby for the user provides related term accurately, make the user need not again to search for, saved server resource.
Description of drawings
Fig. 1 is for providing the process of related term in prior art;
The process that related term is provided that Fig. 2 provides for the embodiment of the present application;
The process of definite relevance scores computation model that Fig. 3 provides for the embodiment of the present application;
Fig. 4 searches for for adopt respectively keyword and related term in this training sample that the embodiment of the present application provides, resulting Search Results schematic diagram;
The apparatus structure schematic diagram that related term is provided that Fig. 5 provides for the embodiment of the present application.
Embodiment
due to server in prior art when providing related term for the user, it is mainly the identical Search Results that the keyword by this user's input respectively by searching that search records in daily record and alternative related term obtain when searching for, according to each the identical Search Results that finds to clicked number of times that should keyword with to clicked number of times that should alternative related term, determine the relevance scores of this keyword and this alternative related term, therefore, if do not record the keyword of this user's input in the search daily record, perhaps, there is not identical Search Results when searching for searching for by this keyword and alternative related term respectively of recording in daily record, prior art can't be calculated the relevance scores of this keyword and this alternative related term, cause the related term that provides not accurate enough.
The embodiment of the present application is for the accuracy of the related term that provides is provided, and server according to the keyword that calculates relevance scores and the related term of setting quantity, is determined the relevance scores computation model in advance.when related term is provided, determine keyword and the feature scores of alternative related term on each feature of setting that the user inputs, this feature scores is input in the relevance scores computation model as the input parameter value, what obtain is exactly the relevance scores of this keyword and this alternative related term, even therefore do not record the keyword of this user's input in the search daily record, perhaps, there is not identical Search Results when searching for searching for by this keyword and alternative related term respectively of recording in daily record, server still can be by the keyword of definite user's input and the feature scores of alternative related term, determine relevance scores both, thereby for the user provides related term accurately.
Below in conjunction with Figure of description, the embodiment of the present application is described in detail.
Fig. 2 is the process that related term is provided that the embodiment of the present application provides, and specifically comprises the following steps:
S201: according to the keyword of user's input, determine each alternative related term of this keyword.
In the embodiment of the present application, server can adopt and the similar method of prior art, determines each alternative related term of the keyword that the user inputs.
S202: for each alternative related term of determining, determine this keyword and the feature scores of this alternative related term on each feature of setting, each definite feature scores as input parameter value input correlation mark computation model, is obtained the relevance scores of this keyword and this alternative related term.
Wherein, this relevance scores computation model is to determine according to the keyword that calculates relevance scores of setting quantity and related term.
In the embodiment of the present application, when server provides related term to the user, according to each feature of setting, determine keyword and the feature scores of alternative related term on this each feature of this user's input, and based on the relevance scores computation model of determining, the feature scores of determining is input in the relevance scores computation model as the input parameter value, and the result of calculation that obtains is the relevance scores of this keyword and this alternative related term.
S203: according to this keyword that obtains and the relevance scores of each alternative related term, select to offer this user's related term in each alternative related term.
Wherein, can select relevance scores to offer the user greater than all alternative related terms of certain threshold value as the related term of this keyword, also can select the alternative related term of the larger setting number of relevance scores to offer the user.
In the embodiment of the present application, server determine the relevance scores computation model process as shown in Figure 3.The process of definite relevance scores computation model that Fig. 3 provides for the embodiment of the present application specifically comprises the following steps:
S301: the keyword of determining to have calculated relevance scores as training sample, is selected the training sample of setting quantity with related term.
In the embodiment of the present application, whois lookup has calculated keyword and the related term of relevance scores, also namely search keyword and the related term of known correlation mark, wherein, the keyword that finds and the relevance scores of related term are can calculate by the algorithm that calculates relevance scores in prior art, the relevance scores that for example can calculate by SimRank algorithm of the prior art or CosRank algorithm.
Server as training sample, also, comprise a keyword and a related term at a training sample, and relevance scores both is known with the keyword of the known correlation mark that finds and related term.The training sample that server is selected to set quantity is used for obtaining the relevance scores computation model at subsequent step, and wherein, this setting quantity can be set as required, and it is larger that this sets quantity, and the accuracy of the follow-up correlation calculations model that obtains is higher.
S302: for each training sample of selecting, according to each feature of setting, determine keyword and the feature scores of related term on each feature in this training sample, keyword in this training sample that has calculated and the relevance scores of related term are defined as desired value, keyword and the feature scores of related term on each feature in this training sample of determining are defined as the input parameter value.
In the embodiment of the present application, server is for each training sample, similarity on each feature of setting quantizes with the keyword in this training sample and related term, the feature scores on each feature as the keyword in this training sample and related term, wherein, each feature of setting can be set as required, and feature is more, and the accuracy of the follow-up relevance scores computation model that obtains is higher.each feature of setting in the embodiment of the present application comprises: adopt respectively keyword and related term in this training sample to search for, the similarity of resulting Search Results on the Search Results classification, and, adopt respectively keyword and related term in this training sample to search for, the similarity of resulting Search Results on the Search Results attribute, and, keyword in this training sample and the Editing similarity of related term, and, adopt respectively keyword and related term in this training sample to search for, the similarity of resulting Search Results on Search Results is clicked.Certainly, can be also the combination of one or more features in above-mentioned four kinds of features.
The below describes as an example of search commercial articles information example.
When search commercial articles information, the keyword in each training sample and related term are all for the keyword of searching for certain commodity and related term, and above-mentioned feature is specific as follows:
Adopt respectively keyword and related term in this training sample to search for, the similarity of resulting Search Results on the Search Results classification is specially: adopt respectively keyword and related term to search for, the similarity of the merchandise news that obtains on the commodity classification.For example, keyword in this training sample is: A brand a model mobile phone, related term is: A brand b model mobile phone, employing keyword (A brand a model mobile phone) is searched for the Search Results that obtains and is: mobile phone 1, mobile phone 2, mobile phone 3, employing related term (A brand b model mobile phone) is searched for the Search Results that obtains and is: mobile phone 4, mobile phone 5, mobile phone 1, this feature is the similarity of these two Search Results on the commodity classification.
Adopt respectively keyword and related term in this training sample to search for, the similarity of resulting Search Results on the Search Results attribute is specially: adopt respectively keyword and related term to search for, the similarity of the merchandise news that obtains on item property.Continue to continue to use example, this feature is the similarity of these two Search Results on item property.
Keyword in this training sample and the Editing similarity of related term are specially: the editing distance of keyword and related term.Continue to continue to use example, this feature is the editing distance of keyword (A brand a model mobile phone) and related term (A brand b model mobile phone).
Adopt respectively keyword and related term in this training sample to search for, the similarity of resulting Search Results on Search Results is clicked is specially: adopt respectively keyword and related term to search for, the similarity of the number of times that the identical merchandise news corresponding keyword of difference that obtains and related term are clicked.Continue to continue to use example, in these two Search Results, identical Search Results is: mobile phone 1, this feature is when searching for by keyword (A brand a model mobile phone), the number of times that mobile phone 1 is clicked, with by related term (A brand b model mobile phone) when search, the similarity of the number of times that mobile phone 1 is clicked.
Quantize above-mentioned similarity and obtain keyword and the feature scores of related term on each feature in this training sample, this feature scores is defined as the input parameter value, with known relevance scores both as desired value.
S303: according to desired value and the input parameter value determined for each training sample, adopt the algorithm of setting to carry out regressing calculation, obtain the relevance scores computation model.
In the embodiment of the present application, the relevance scores computation model that obtains after regressing calculation satisfies: for any training sample, after being input in this model as the input parameter value keyword in this training sample of determining and the feature scores of related term on each feature, the keyword in the result of calculation that obtains and known this training sample is identical with the relevance scores of related term.Also namely, according to desired value and the input parameter value determined for each training sample, simulate after the input of input parameter value, can access the model of respective objects value, as the relevance scores computation model.And, can adopt support vector machine (SVM) algorithm to carry out regressing calculation, also can adopt Evaluation model (Logit) algorithm to carry out regressing calculation, can certainly adopt other regression algorithms to carry out regressing calculation.
in said process, server selects to set the keyword of known correlation mark of quantity and related term in advance as training sample, for each training sample, quantize keyword and the similarity of related term on each feature of setting in this training sample, also namely determine keyword and the feature scores of related term on each feature in this training sample, with the relevance scores of this known training sample as desired value, with each feature scores of determining as the input parameter value, according to desired value and the input parameter value determined for each training sample, adopt the algorithm of setting to carry out regressing calculation, obtain the relevance scores computation model, the relevance scores computation model that obtains is satisfied, after will inputting this model for the input parameter value that any training sample is determined, the result that obtains is identical with the desired value of this known training sample.When related term is provided, keyword and the feature scores of alternative related term on each feature of user's input can be input in the relevance scores computation model, obtain the keyword of this user's input and the relevance scores of alternative related term, and related term is provided accordingly.Even therefore the keyword of user's input is not recorded in the search daily record, perhaps, there are not identical Search Results in the keyword of inputting by the user respectively that search is recorded in daily record and alternative related term when searching for, the method that the embodiment of the present application provides also can be determined the keyword of user's input and the relevance scores of alternative related term accurately, thereby can provide related term accurately for the user accordingly, make the user need not again to search for, saved server resource.
In above-mentioned step S302 shown in Figure 3, server need to quantize keyword and the similarity of related term on each feature in this training sample, obtains corresponding feature scores.Also namely, determine following each feature scores:
Determine to adopt respectively keyword and related term in this training sample to search for the similarity score of resulting Search Results on the Search Results classification; And
Determine to adopt respectively keyword and related term in this training sample to search for the similarity score of resulting Search Results on the Search Results attribute; And
Determine that the editing distance of keyword in this training sample and related term is as the editing distance mark; And
Determine to adopt respectively keyword and related term in this training sample to search for the similarity score of resulting Search Results on Search Results is clicked.
Wherein, determine to adopt respectively keyword and related term in this training sample to search for, the method of the similarity score of resulting Search Results on the Search Results classification is specially: adopt the keyword in this training sample to search for, for each Search Results classification, determine the resulting number that belongs to the Search Results of this Search Results classification, and the Search Results that obtains sum, determine to belong to the number of this search class purpose Search Results and the ratio of Search Results sum; To adopt the keyword in this training sample to search for, the vector that each ratio of determining for each Search Results consists of be defined as keyword classification vector; Adopt the related term in this training sample to search for, for each Search Results classification, determine the resulting number that belongs to the Search Results of this Search Results classification, and the Search Results that obtains sum, determine to belong to the number of Search Results of this Search Results classification and the ratio of Search Results sum; To adopt the related term in this training sample to search for, the vector that each ratio of determining for each Search Results consists of be defined as related term classification vector; Determine the cosine value of keyword classification vector and related term classification vector, be defined as adopting respectively keyword and related term in this training sample to search for this cosine value, the similarity score of resulting Search Results on the Search Results classification.
for example, the Search Results classification has four, be respectively: classification 1, classification 2, classification 3, classification 4, adopt the keyword in this training sample to search for, the Search Results that obtains adds up to N, in this N Search Results, the number that belongs to the Search Results of classification 1 is n1, the ratio of determining for classification 1 is n1/N, accordingly, the number that belongs to the Search Results of classification 2 is n2, the ratio of determining for classification 2 is n2/N, the number that belongs to the Search Results of classification 3 is n3, the ratio of determining for classification 3 is n3/N, the number that belongs to the Search Results of classification 4 is n4, the ratio of determining for classification 4 is n4/N, wherein, n1, n2, n3, n4 is N with value, n1, n2, n3, n4 is and is not less than 0 and be not more than the positive integer of N.Adopt the keyword in this training sample to search for, the vector that each ratio of determining for each Search Results classification consists of is (n1/N, n2/N, n3/N, n4/N), and this vector is keyword classification vector.Accordingly, adopt the related term in this training sample to search for, the Search Results that obtains adds up to M, the number that wherein belongs to the Search Results of classification 1, classification 2, classification 3, classification 4 is respectively m1, m2, m3, m4, the ratio of determining for each Search Results classification is respectively m1/N, m2/N, m3/N, m4/N, adopt the related term in this training sample to search for, the vector that each ratio of determining for each Search Results classification consists of is (m1/N, m2/N, m3/N, m4/N), this vector is related term classification vector.Therefore, determine vector (n1/N, n2/N, n3/N, n4/N) and vector (m1/N, m2/N, m3/N, m4/N) cosine value is defined as adopting respectively keyword and related term in this training sample to search for this cosine value, the similarity score of resulting Search Results on the Search Results classification.
Determine to adopt respectively keyword and related term in this training sample to search for, the method of the similarity score of resulting Search Results on the Search Results attribute is specially: according to the keyword in this training sample, determine to adopt the keyword in this training sample to search for each attribute corresponding to resulting Search Results, gather as element consists of first take each attribute of determining; According to the related term in this training sample, determine to adopt the related term in this training sample to search for each attribute corresponding to resulting Search Results, gather as element consists of second take each attribute of determining; Determine the first set and the second intersection of sets collection and union, determine the number and the ratio that is somebody's turn to do and concentrates the number of the element that comprises of the element that comprises in this common factor, be defined as adopting respectively keyword and related term in this training sample to search for this ratio, the similarity score of resulting Search Results on the Search Results attribute.
For example, adopt keyword in this training sample to search for Search Results attribute corresponding to resulting Search Results and comprise two kinds, be respectively: attribute 1, attribute 2, the first set that consists of take these two kinds of Search Results attributes as element is { attribute 1, attribute 2}.Accordingly, adopt related term in this training sample to search for Search Results attribute corresponding to resulting Search Results and also comprise two kinds, be respectively attribute 2, attribute 3, the second set that consists of take these two kinds of Search Results attributes as element is { attribute 2, attribute 3}.The first set is { attribute 2} with the second intersection of sets collection, union is { attribute 1, attribute 2, attribute 3}, the number of the element that comprises in as seen occuring simultaneously is 1, and to concentrate the number of the element that comprises be 3, and ratio both is 1/3, be defined as adopting respectively keyword and related term in this training sample to search for this ratio 1/3, the similarity score of resulting Search Results on the Search Results attribute.
The editing distance of determining keyword in this training sample and related term is specially as the method for editing distance mark: determine this keyword is changed to the needed number of operations of this related term, editing distance as this keyword and related term, wherein, operation with a character deletion in this keyword, and, add the single job that is operating as of a character in this keyword.
For example, keyword in this training sample is: A brand a model mobile phone, related term is: A brand b model mobile phone, this keyword to be changed to being operating as that this related term will do: the character in keyword " a " is deleted, add character " b " in this keyword, therefore number of operations is 2 times, and also namely the keyword in this training sample and the editing distance of related term are 2, and the editing distance mark is 2.
Determine to adopt respectively keyword and related term in this training sample to search for, the method of the similarity score of resulting Search Results on Search Results is clicked is specially: keyword and related term in determining to adopt respectively this training sample are searched for, resulting each identical Search Results; For each identical Search Results, according to the record in the search daily record, the clicked number of times of this Search Results when determining to search for by the keyword in this training sample, the clicked number of times of this Search Results when determining to search for by the related term in this training sample; In the time of searching for by the keyword in this training sample, the vector that each clicked number of times of determining for each identical Search Results consists of is defined as keyword and clicks vector; In the time of searching for by the related term in this training sample, the vector that each clicked number of times of determining for each identical Search Results consists of is defined as related term and clicks vector; Determine that vector clicked in keyword and related term is clicked vectorial cosine value, be defined as adopting respectively keyword and related term in this training sample to search for this cosine value, the similarity score of resulting Search Results on Search Results is clicked.
For example, as shown in Figure 4, Fig. 4 searches for for adopt respectively keyword and related term in this training sample that the embodiment of the present application provides, resulting Search Results schematic diagram.In Fig. 4, adopt the keyword in this training sample to search for, the Search Results that obtains is: result 1, result 2, result 3, result 4, and adopt the related term in this training sample to search for, the Search Results that obtains is: result 2, result 3, result 5, result 6.Identical Search Results is: result 2, result 3.For result 2, according to the record in the search daily record, when determining to arrive result 2 by this keyword search, the number of times i2 that this result 2 is clicked, when determining to search result 2 by this related term, the number of times j2 that this result 2 is clicked.For result 3, according to the record in the search daily record, when determining to arrive result 3 by this keyword search, the number of times i3 that this result 3 is clicked, when determining to search result 3 by this related term, the number of times j3 that this result 3 is clicked.When searching for by the keyword in this training sample, the vector that each clicked number of times of determining for each identical Search Results consists of is (i2, i3), and this vector is keyword and clicks vector.When searching for by the related term in this training sample, the vector that each clicked number of times of determining for each identical Search Results consists of is (j2, j3), and this vector is related term and clicks vector.Determine keyword click vector (i2, i3) click vector (j2 with keyword, j3) cosine value is defined as adopting respectively keyword and related term in this training sample to search for this cosine value, the similarity score of resulting Search Results on Search Results is clicked.
Above-mentionedly keyword in the training sample and each feature scores of related term have been determined, in follow-up step, can be for the definite feature scores of each training sample, and the known relevance scores of each training sample, adopt the algorithm of setting to carry out regressing calculation and obtain the relevance scores computation model, and based on the relevance scores computation model that obtains, determine keyword that the user inputs and the relevance scores of each alternative related term, and provide related term to the user accordingly.
In the embodiment of the present application, can also be on the basis of the feature of above-mentioned four settings, increase as required more fine-grained other features, for example when search commercial articles information, can also be on the basis of above-mentioned four features, increasing adopts respectively keyword and related term in this training sample to search for, resulting merchandise news is just given unnecessary details here no longer one by one in the similarity score on brand, in the similarity score on model, in the similarity score on the commodity color, similarity score etc. on the commodity quality.
Fig. 5 is the apparatus structure schematic diagram that related term is provided that the embodiment of the present application provides, and specifically comprises:
Alternative related term determination module 501 is used for the keyword according to user's input, determines each alternative related term of described keyword;
Relevance scores determination module 502, be used for for each alternative related term of determining, determine described keyword and the feature scores of this alternative related term on each feature of setting, with each feature scores of determining as input parameter value input correlation mark computation model, obtain the relevance scores of this keyword and this alternative related term, wherein, described relevance scores computation model is to determine according to the keyword that calculates relevance scores of setting quantity and related term;
Related term provides module 503, is used for selecting to offer described user's related term according to the described keyword that obtains and the relevance scores of each alternative related term in each alternative related term.
Described relevance scores determination module 502 comprises:
Determine chooser module 5021, keyword that be used for to determine to have calculated relevance scores as training sample, is selected the training sample of setting quantity with related term;
Feature scores is determined submodule 5022, be used for for each training sample of selecting, each feature according to described setting, determine keyword and the feature scores of related term on each feature in this training sample, keyword in this training sample that has calculated and the relevance scores of related term are defined as desired value, keyword and the feature scores of related term on each feature in this training sample of determining are defined as the input parameter value;
Model is determined submodule 5023, is used for adopting the algorithm of setting to carry out regressing calculation according to for each training sample definite desired value and input parameter value, obtains the relevance scores computation model.
described feature scores determines that submodule 5022 specifically is used for, determine to adopt respectively keyword and related term in this training sample to search for, the similarity score of resulting Search Results on the Search Results classification, and, determine to adopt respectively keyword and related term in this training sample to search for, the similarity score of resulting Search Results on the Search Results attribute, and, determine that the editing distance of keyword in this training sample and related term is as the editing distance mark, and, determine to adopt respectively keyword and related term in this training sample to search for, the similarity score of resulting Search Results on Search Results is clicked.
Described feature scores determines that submodule 5022 specifically is used for, adopt the keyword in this training sample to search for, for each Search Results classification, determine the resulting number that belongs to the Search Results of this Search Results classification, and the Search Results that obtains sum, determine to belong to the ratio of this search class purpose Search Results number and Search Results sum, to adopt the keyword in this training sample to search for, the vector that each ratio of determining for each Search Results classification consists of be defined as keyword classification vector; Adopt the related term in this training sample to search for, for each Search Results classification, determine the resulting number that belongs to the Search Results of this Search Results classification, and the Search Results that obtains sum, determine to belong to the Search Results number of this Search Results classification and the ratio of Search Results sum, to adopt the related term in this training sample to search for, the vector that each ratio of determining for each Search Results classification consists of be defined as related term classification vector; Determine the cosine value of described keyword classification vector and described related term classification vector, be defined as adopting respectively keyword and related term in this training sample to search for described cosine value, the similarity score of resulting Search Results on the Search Results classification.
Described feature scores determines that submodule 5022 specifically is used for, according to the keyword in this training sample, determine to adopt the keyword in this training sample to search for each attribute corresponding to resulting Search Results, gather as element consists of first take each attribute of determining; According to the related term in this training sample, determine to adopt the related term in this training sample to search for each attribute corresponding to resulting Search Results, gather as element consists of second take each attribute of determining; Determine described the first set and the second intersection of sets collection and union, determine the ratio of the number of the number of the element that comprises in described common factor and the described and concentrated element that comprises, be defined as adopting respectively keyword and related term in this training sample to search for described ratio, the similarity score of resulting Search Results on the Search Results attribute.
Described feature scores determines that submodule 5022 specifically is used for, and determines to adopt respectively keyword and related term in this training sample to search for resulting each identical Search Results; For each identical Search Results, according to the record in the search daily record, the clicked number of times of this Search Results when determining to search for by the keyword in this training sample, the clicked number of times of this Search Results when determining to search for by the related term in this training sample; In the time of searching for by the keyword in this training sample, the vector that each clicked number of times of determining for each identical Search Results consists of is defined as keyword and clicks vector; In the time of searching for by the related term in this training sample, the vector that each clicked number of times of determining for each identical Search Results consists of is defined as related term and clicks vector; Determine that vector clicked in described keyword and described related term is clicked vectorial cosine value, be defined as adopting respectively keyword and related term in this training sample to search for described cosine value, the similarity score of resulting Search Results on Search Results is clicked.
Described model determines that submodule 5022 specifically is used for, and adopts the support vector machines algorithm to carry out regressing calculation, obtains the relevance scores computation model, perhaps, adopts Evaluation model Logit algorithm to carry out regressing calculation, obtains the relevance scores computation model.
The concrete above-mentioned device of related term that provides can be arranged in server.
The embodiment of the present application provides a kind of method and device that related term is provided, the method is for each alternative related term of the keyword of user's input, with this keyword and the feature scores input correlation mark computation model of this alternative related term on each feature of setting, obtain the relevance scores of this keyword and this alternative related term, and provide accordingly related term, wherein, this relevance scores computation model is to determine according to the keyword that calculates relevance scores of setting quantity and related term.Pass through said method, even the keyword of user's input is not recorded in the search daily record, also can be by the feature scores input correlation mark computation model with this keyword and each alternative related term, obtain the relevance scores of this keyword and each alternative related term, thereby for the user provides related term accurately, make the user need not again to search for, saved server resource.
Obviously, those skilled in the art can carry out various changes and modification and the spirit and scope that do not break away from the application to the application.Like this, if within these of the application are revised and modification belongs to the scope of the application's claim and equivalent technologies thereof, the application also is intended to comprise these changes and modification interior.

Claims (14)

1. the method that related term is provided, is characterized in that, comprising:
According to the keyword of user's input, determine each alternative related term of described keyword;
For each alternative related term of determining, determine described keyword and the feature scores of this alternative related term on each feature of setting, with each feature scores of determining as input parameter value input correlation mark computation model, obtain the relevance scores of this keyword and this alternative related term, wherein, described relevance scores computation model is to determine according to the keyword that calculates relevance scores of setting quantity and related term;
According to the described keyword that obtains and the relevance scores of each alternative related term, select to offer described user's related term in each alternative related term.
2. the method for claim 1, is characterized in that, according to the keyword that calculates relevance scores and the related term of setting quantity, determines described relevance scores computation model, specifically comprises:
The keyword of determining to have calculated relevance scores as training sample, is selected the training sample of setting quantity with related term;
For each training sample of selecting, each feature according to described setting, determine keyword and the feature scores of related term on each feature in this training sample, keyword in this training sample that has calculated and the relevance scores of related term are defined as desired value, keyword and the feature scores of related term on each feature in this training sample of determining are defined as the input parameter value;
According to desired value and the input parameter value determined for each training sample, adopt the algorithm of setting to carry out regressing calculation, obtain the relevance scores computation model.
3. method as claimed in claim 2, is characterized in that, according to each feature of described setting, determines keyword and the feature scores of related term on each feature in this training sample, specifically comprises:
Determine to adopt respectively keyword and related term in this training sample to search for the similarity score of resulting Search Results on the Search Results classification; And
Determine to adopt respectively keyword and related term in this training sample to search for the similarity score of resulting Search Results on the Search Results attribute; And
Determine that the editing distance of keyword in this training sample and related term is as the editing distance mark; And
Determine to adopt respectively keyword and related term in this training sample to search for the similarity score of resulting Search Results on Search Results is clicked.
4. method as claimed in claim 3, is characterized in that, determines to adopt respectively keyword and related term in this training sample to search for, and the similarity score of resulting Search Results on the Search Results classification specifically comprises:
Adopt the keyword in this training sample to search for, for each Search Results classification, determine the resulting number that belongs to the Search Results of this Search Results classification, and the Search Results that obtains sum, determine to belong to the number of this search class purpose Search Results and the ratio of Search Results sum;
To adopt the keyword in this training sample to search for, the vector that each ratio of determining for each Search Results classification consists of be defined as keyword classification vector;
Adopt the related term in this training sample to search for, for each Search Results classification, determine the resulting number that belongs to the Search Results of this Search Results classification, and the Search Results that obtains sum, determine to belong to the number of Search Results of this Search Results classification and the ratio of Search Results sum;
To adopt the related term in this training sample to search for, the vector that each ratio of determining for each Search Results classification consists of be defined as related term classification vector;
Determine the cosine value of described keyword classification vector and described related term classification vector, be defined as adopting respectively keyword and related term in this training sample to search for described cosine value, the similarity score of resulting Search Results on the Search Results classification.
5. method as claimed in claim 3, is characterized in that, determines to adopt respectively keyword and related term in this training sample to search for, and the similarity score of resulting Search Results on the Search Results attribute specifically comprises:
According to the keyword in this training sample, determine to adopt the keyword in this training sample to search for each attribute corresponding to resulting Search Results, gather as element consists of first take each attribute of determining;
According to the related term in this training sample, determine to adopt the related term in this training sample to search for each attribute corresponding to resulting Search Results, gather as element consists of second take each attribute of determining;
Determine described the first set and the second intersection of sets collection and union, determine the ratio of the number of the number of the element that comprises in described common factor and the described and concentrated element that comprises, be defined as adopting respectively keyword and related term in this training sample to search for described ratio, the similarity score of resulting Search Results on the Search Results attribute.
6. method as claimed in claim 3, is characterized in that, determines to adopt respectively keyword and related term in this training sample to search for, and the similarity score of resulting Search Results on Search Results is clicked specifically comprises:
Determine to adopt respectively keyword and related term in this training sample to search for resulting each identical Search Results;
For each identical Search Results, according to the record in the search daily record, the clicked number of times of this Search Results when determining to search for by the keyword in this training sample, the clicked number of times of this Search Results when determining to search for by the related term in this training sample;
In the time of searching for by the keyword in this training sample, the vector that each clicked number of times of determining for each identical Search Results consists of is defined as keyword and clicks vector;
In the time of searching for by the related term in this training sample, the vector that each clicked number of times of determining for each identical Search Results consists of is defined as related term and clicks vector;
Determine that vector clicked in described keyword and described related term is clicked vectorial cosine value, be defined as adopting respectively keyword and related term in this training sample to search for described cosine value, the similarity score of resulting Search Results on Search Results is clicked.
7. described method as arbitrary in claim 2~6, is characterized in that, adopts the algorithm of setting to carry out regressing calculation, obtains the relevance scores computation model, specifically comprises:
Adopt the support vector machines algorithm to carry out regressing calculation, obtain the relevance scores computation model; Perhaps
Adopt Evaluation model Logit algorithm to carry out regressing calculation, obtain the relevance scores computation model.
8. the device that related term is provided, is characterized in that, comprising:
Alternative related term determination module is used for the keyword according to user's input, determines each alternative related term of described keyword;
The relevance scores determination module, be used for for each alternative related term of determining, determine described keyword and the feature scores of this alternative related term on each feature of setting, with each feature scores of determining as input parameter value input correlation mark computation model, obtain the relevance scores of this keyword and this alternative related term, wherein, described relevance scores computation model is to determine according to the keyword that calculates relevance scores of setting quantity and related term;
Related term provides module, is used for selecting to offer described user's related term according to the described keyword that obtains and the relevance scores of each alternative related term in each alternative related term.
9. device as claimed in claim 8, is characterized in that, described relevance scores determination module comprises:
Determine the chooser module, keyword that be used for to determine to have calculated relevance scores as training sample, is selected the training sample of setting quantity with related term;
Feature scores is determined submodule, be used for for each training sample of selecting, each feature according to described setting, determine keyword and the feature scores of related term on each feature in this training sample, keyword in this training sample that has calculated and the relevance scores of related term are defined as desired value, keyword and the feature scores of related term on each feature in this training sample of determining are defined as the input parameter value;
Model is determined submodule, is used for adopting the algorithm of setting to carry out regressing calculation according to for each training sample definite desired value and input parameter value, obtains the relevance scores computation model.
10. device as claimed in claim 9, it is characterized in that, described feature scores determines that submodule specifically is used for, determine to adopt respectively keyword and related term in this training sample to search for, the similarity score of resulting Search Results on the Search Results classification, and, determine to adopt respectively keyword and related term in this training sample to search for, the similarity score of resulting Search Results on the Search Results attribute, and, determine that the editing distance of keyword in this training sample and related term is as the editing distance mark, and, determine to adopt respectively keyword and related term in this training sample to search for, the similarity score of resulting Search Results on Search Results is clicked.
11. device as claimed in claim 10, it is characterized in that, described feature scores determines that submodule specifically is used for, adopt the keyword in this training sample to search for, for each Search Results classification, determine the resulting number that belongs to the Search Results of this Search Results classification, and the Search Results that obtains sum, determine to belong to the ratio of this search class purpose Search Results number and Search Results sum, search for adopting the keyword in this training sample, the vector that each ratio of determining for each Search Results classification consists of is defined as keyword classification vector, adopt the related term in this training sample to search for, for each Search Results classification, determine the resulting number that belongs to the Search Results of this Search Results classification, and the Search Results that obtains sum, determine to belong to the Search Results number of this Search Results classification and the ratio of Search Results sum, to adopt the related term in this training sample to search for, the vector that each ratio of determining for each Search Results classification consists of be defined as related term classification vector, determine the cosine value of described keyword classification vector and described related term classification vector, be defined as adopting respectively keyword and related term in this training sample to search for described cosine value, the similarity score of resulting Search Results on the Search Results classification.
12. device as claimed in claim 10, it is characterized in that, described feature scores determines that submodule specifically is used for, according to the keyword in this training sample, determine to adopt the keyword in this training sample to search for each attribute corresponding to resulting Search Results, gather as element consists of first take each attribute of determining; According to the related term in this training sample, determine to adopt the related term in this training sample to search for each attribute corresponding to resulting Search Results, gather as element consists of second take each attribute of determining; Determine described the first set and the second intersection of sets collection and union, determine the ratio of the number of the number of the element that comprises in described common factor and the described and concentrated element that comprises, be defined as adopting respectively keyword and related term in this training sample to search for described ratio, the similarity score of resulting Search Results on the Search Results attribute.
13. device as claimed in claim 10 is characterized in that, described feature scores determines that submodule specifically is used for, and determines to adopt respectively keyword and related term in this training sample to search for resulting each identical Search Results; For each identical Search Results, according to the record in the search daily record, the clicked number of times of this Search Results when determining to search for by the keyword in this training sample, the clicked number of times of this Search Results when determining to search for by the related term in this training sample; In the time of searching for by the keyword in this training sample, the vector that each clicked number of times of determining for each identical Search Results consists of is defined as keyword and clicks vector; In the time of searching for by the related term in this training sample, the vector that each clicked number of times of determining for each identical Search Results consists of is defined as related term and clicks vector; Determine that vector clicked in described keyword and described related term is clicked vectorial cosine value, be defined as adopting respectively keyword and related term in this training sample to search for described cosine value, the similarity score of resulting Search Results on Search Results is clicked.
14. described device as arbitrary in claim 9~13, it is characterized in that, described model determines that submodule specifically is used for, adopt the support vector machines algorithm to carry out regressing calculation, obtain the relevance scores computation model, perhaps, adopt Evaluation model Logit algorithm to carry out regressing calculation, obtain the relevance scores computation model.
CN201110376840.4A 2011-11-23 2011-11-23 Method and device for providing related words Active CN103136213B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110376840.4A CN103136213B (en) 2011-11-23 2011-11-23 Method and device for providing related words

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110376840.4A CN103136213B (en) 2011-11-23 2011-11-23 Method and device for providing related words

Publications (2)

Publication Number Publication Date
CN103136213A true CN103136213A (en) 2013-06-05
CN103136213B CN103136213B (en) 2017-04-12

Family

ID=48496050

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110376840.4A Active CN103136213B (en) 2011-11-23 2011-11-23 Method and device for providing related words

Country Status (1)

Country Link
CN (1) CN103136213B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095474A (en) * 2015-08-11 2015-11-25 北京奇虎科技有限公司 Method and device for establishing recommendation relation between searching terms and application data
CN108052568A (en) * 2017-12-07 2018-05-18 百度在线网络技术(北京)有限公司 A kind of Feature Selection method, apparatus, terminal and medium
CN108334631A (en) * 2018-02-24 2018-07-27 武汉斗鱼网络科技有限公司 Method, corresponding medium and the equipment of synonym for excavating direct broadcasting room search term
CN108763332A (en) * 2018-05-10 2018-11-06 北京奇艺世纪科技有限公司 A kind of generation method and device of Search Hints word
CN110795628A (en) * 2017-06-29 2020-02-14 北京拉勾科技有限公司 Search term processing method and device based on correlation and computing equipment
WO2020052067A1 (en) * 2018-09-12 2020-03-19 北京字节跳动网络技术有限公司 Information search method and device
CN111782912A (en) * 2019-04-04 2020-10-16 百度在线网络技术(北京)有限公司 Word recommendation method, device, server and medium
CN113779417A (en) * 2021-11-12 2021-12-10 中国信息通信研究院 Digital asset object searching method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102063432A (en) * 2009-11-12 2011-05-18 阿里巴巴集团控股有限公司 Retrieval method and retrieval system
CN102063469A (en) * 2010-12-03 2011-05-18 百度在线网络技术(北京)有限公司 Method and device for acquiring relevant keyword message and computer equipment
CN102214169A (en) * 2010-04-02 2011-10-12 阿里巴巴集团控股有限公司 Methods and devices for providing keyword information and target information

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102063432A (en) * 2009-11-12 2011-05-18 阿里巴巴集团控股有限公司 Retrieval method and retrieval system
CN102214169A (en) * 2010-04-02 2011-10-12 阿里巴巴集团控股有限公司 Methods and devices for providing keyword information and target information
CN102063469A (en) * 2010-12-03 2011-05-18 百度在线网络技术(北京)有限公司 Method and device for acquiring relevant keyword message and computer equipment

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095474B (en) * 2015-08-11 2018-12-14 北京奇虎科技有限公司 Establish the method and device of search term and application data recommendation relationship
CN105095474A (en) * 2015-08-11 2015-11-25 北京奇虎科技有限公司 Method and device for establishing recommendation relation between searching terms and application data
CN110795628A (en) * 2017-06-29 2020-02-14 北京拉勾科技有限公司 Search term processing method and device based on correlation and computing equipment
CN110795628B (en) * 2017-06-29 2023-04-11 北京拉勾科技有限公司 Search term processing method and device based on correlation and computing equipment
CN108052568B (en) * 2017-12-07 2020-11-10 百度在线网络技术(北京)有限公司 Feature screening method, device, terminal and medium
CN108052568A (en) * 2017-12-07 2018-05-18 百度在线网络技术(北京)有限公司 A kind of Feature Selection method, apparatus, terminal and medium
CN108334631A (en) * 2018-02-24 2018-07-27 武汉斗鱼网络科技有限公司 Method, corresponding medium and the equipment of synonym for excavating direct broadcasting room search term
CN108763332A (en) * 2018-05-10 2018-11-06 北京奇艺世纪科技有限公司 A kind of generation method and device of Search Hints word
WO2020052067A1 (en) * 2018-09-12 2020-03-19 北京字节跳动网络技术有限公司 Information search method and device
CN111782912A (en) * 2019-04-04 2020-10-16 百度在线网络技术(北京)有限公司 Word recommendation method, device, server and medium
CN111782912B (en) * 2019-04-04 2023-08-15 百度在线网络技术(北京)有限公司 Word recommendation method, device, server and medium
CN113779417A (en) * 2021-11-12 2021-12-10 中国信息通信研究院 Digital asset object searching method and device, electronic equipment and storage medium
CN113779417B (en) * 2021-11-12 2022-04-01 中国信息通信研究院 Digital asset object searching method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN103136213B (en) 2017-04-12

Similar Documents

Publication Publication Date Title
CN103136213A (en) Method and device for providing related words
JP5721818B2 (en) Use of model information group in search
US8799275B2 (en) Information retrieval based on semantic patterns of queries
CN102799591B (en) Method and device for providing recommended word
CN109299383B (en) Method and device for generating recommended word, electronic equipment and storage medium
US20150278359A1 (en) Method and apparatus for generating a recommendation page
CN103425687A (en) Retrieval method and system based on queries
CN103207881B (en) Querying method and device
TW201805839A (en) Data processing method, device and system
CN106250513A (en) A kind of event personalization sorting technique based on event modeling and system
WO2021218322A1 (en) Paragraph search method and apparatus, and electronic device and storage medium
CN105701216A (en) Information pushing method and device
CN103365839A (en) Recommendation search method and device for search engines
CN103838756A (en) Method and device for determining pushed information
CN102411591A (en) Method and equipment for processing information
US20110208715A1 (en) Automatically mining intents of a group of queries
CN104933100A (en) Keyword recommendation method and device
CN104008186A (en) Method and device for determining keywords in target text
CN102968417A (en) Searching method and system applied to computer network
CN103823900A (en) Information point significance determining method and device
CN104077286A (en) Commodity information search method and system
CN104978368A (en) Method and device used for providing recommendation information
US20120254148A1 (en) Serving multiple search indexes
CN103885971A (en) Data pushing method and data pushing device
CN104965918B (en) A kind of searching method and device based on searching keyword

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1181487

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant
REG Reference to a national code

Ref country code: HK

Ref legal event code: GR

Ref document number: 1181487

Country of ref document: HK