A kind of searching method and device based on classification
Technical field
The application is related to search technique, more particularly to a kind of searching method and device based on classification.
Background technology
Resource on network is very enriched, and user can search all trades and professions, the information in each face of each side from network.
Resource on network can be divided according to respective classification, facilitate user's storage resource and search resource.
In some comprehensive websites, generally big classification just has tens, after the big classification is accurately divided, specifically
Classification may have thousand of.When the comprehensive website is browsed, user can issue resource to user, it is also possible to search, under
Resource is carried, therefore website can provide classification navigation bar to facilitate user to find the resource wanted, and also allow for user's locating resource
Suitable classification.
But, user will be from thousand of classifications of classification navigation bar, according to the descending sequential search of scope to suitable
The classification of conjunction is extremely difficult.Possible searching route is when user searches for when such as doing shopping:Clothing → women's dress → chiffon shirt →
Cotta ... → crew neck → pullover → cultivate one's moral character etc..And for example, during uploaded videos, the searching route of user may be:Video → electricity
Depending on play → Hong Kong and Taiwan ... → policemen and bandits → 2012 etc..The lookup of classification, searching method are very loaded down with trivial details, and the time of consuming is long,
Efficiency comparison is low.In addition, when user is according to scope descending sequential search classification, client used by user is needed repeatedly
Inquiry request is sent to server, when number of users is larger, tremendous influence certainly will be caused to the access pressure of server.Example
Such as:The searching route of user is:Clothing → women's dress → chiffon shirt → cotta ... → crew neck → pullover → cultivate one's moral character, when user clicks on
When " clothing ", subscription client can send inquiry request to server, and " women's dress " is fed back to user by server after computing,
When user clicks on " women's dress ", subscription client can again send inquiry request to server, and server will " snow after computing
Spin shirt " user is fed back to, by that analogy, it is seen that this access stencil can cause larger access pressure to server.
Therefore, the technical problem that those skilled in the art urgently solve is to propose a kind of searcher based on classification
Method, to solve original class heading search in searching method it is loaded down with trivial details, the time of consuming is long, and efficiency comparison is low and causes for server
The technical problem of larger access pressure.
The content of the invention
The application provides a kind of searching method and device based on classification, to solve original class heading search in searching method it is numerous
Trivial, the time of consuming is long, and efficiency comparison is low and causes the larger technical problem for accessing pressure for server.
In order to solve the above problems, this application discloses a kind of searching method based on classification, including:
The searching request that user sends in receiving platform, wherein, the searching request includes search keyword;
The global classification in global classification storehouse is matched using the search keyword, is obtained first and is matched classification,
And the first similarity of the search keyword and the first matching classification is calculated, wherein, using the classification defined in platform as complete
Office's classification is stored in global classification storehouse;
The individual information of user is obtained, based on the individual information the described first matching classification is carried out Secondary Match to obtain
The second matching classification is obtained, and calculates the second similarity of the second matching classification and search keyword;
According to first similarity and the second similarity, the first matching classification is ranked up and is fed back.
In the embodiment of the present application, after the searching request of the receive user, also include:Search in searching request is closed
Keyword is processed, and obtains at least one in following search terms:Centre word, word, centre word phrase and holophrastic.
In the embodiment of the present application, the global classification in global classification storehouse is matched using the search keyword, obtained
The first matching classification is taken, and calculates the search keyword and first and match the first similarity of classification, including:It is respectively adopted each
Individual search terms is matched to the global classification in global classification storehouse, obtains global classification that each search terms matched as
One matching classification, and calculate the probit of Corresponding matching;The global weight of each search terms is obtained, according to the complete of each search terms
Office's weight is weighted to respective probit, calculates the first similarity of the global classification of the search keyword and matching.
In the embodiment of the present application, the individual information of user is obtained, based on the individual information to the described first matching classification
Secondary Match is carried out to obtain the second matching classification, and it is similar to the second of search keyword to calculate the second matching classification
Degree, including:The individual information of user is obtained, and Secondary Match is carried out to the described first matching classification using the individual information,
Obtain corresponding second matching classification;The centre word and/or word in search terms is obtained, the search terms for obtaining is calculated to described the
The probit of two matching classifications;The individual character weight of each search terms is obtained, according to the individual character weight of each search terms to each general
Rate value is weighted, and calculates the second similarity of the search keyword and the second matching classification.
In the embodiment of the present application, after calculating the first similarity of the search keyword and the first matching classification, also wrap
Include:According to default screening threshold value, the described first matching classification is screened according to respective first similarity, obtain screening
The first matching classification afterwards.
In the embodiment of the present application, according to first similarity and the second similarity, the first matching classification is ranked up
And user is fed back to, including:The first classification weight for matching classification is obtained, according to the classification weight to the first matching classification
First similarity is weighted;For based on individual information carry out Secondary Match first matching classification, by weighting after first
Similarity matches the second similarity summation of classification with second that Secondary Match is obtained;Described first matching classification is ranked up
And feed back.
In the embodiment of the present application, in e-commerce field, the global classification storehouse is to carry out process to product title to obtain
After taking product treatment item, calculate what is constituted after the probability of product treatment item described in the corresponding classification of the product title.
In the embodiment of the present application, in e-commerce field, if user is seller, the individual information of the user is right
The product title that the seller issues is carried out processing and obtained after seller's processing item, is calculated and is sold described in the corresponding classification of the seller
Constitute after the probability of family's processing item.
In the embodiment of the present application, in e-commerce field, if user is buyer, the individual information of the user is right
The product title that the buyer browses is carried out processing and obtained after buyer's processing item, calculates institute in the corresponding classification of the product title
State what is constituted after the probability of buyer's processing item.
Accordingly, the invention also discloses a kind of searcher based on classification, including:
Receiver module, for the searching request that user in receiving platform sends, wherein, the searching request includes search
Key word;
Global search module, for being matched to the global classification in global classification storehouse using the search keyword,
Obtain first and match classification, and calculate the search keyword and first and match the first similarity of classification, wherein, by platform
The classification of definition is stored in global classification storehouse as global classification;
Personalized search module, for obtaining the individual information of user, based on the individual information to the described first matching class
Mesh carries out Secondary Match to obtain the second matching classification, and it is similar to the second of search keyword to calculate the second matching classification
Degree;
Sequence and feedback module, for according to first similarity and the second similarity, carrying out to the first matching classification
Sort and feed back to user.
Compared with prior art, the application includes advantages below:
First, prior art will successively be searched when classification is searched for according to the descending order of scope, and method is very numerous
Trivial, the time of consuming is long, and efficiency comparison is low.The application can obtain search keyword from the searching request of user, so
Carry out the matching based on global classification storehouse and the Secondary Match based on individual information respectively afterwards, user only needs input search crucial
Word, it is relatively time-consuming and efficiency is higher without the lookup one by one of machinery, and avoid in prior art according to scope by
Arriving little order greatly carries out sending the access pressure caused by inquiry request to server repeatedly due to needing when classification is inquired about.Its
In, by being matched and be calculated the first similarity to all of global classification in platform in global classification storehouse, now obtain
The the first matching classification for taking is very comprehensive;On this basis, when carrying out Secondary Match according still further to individual information and second is calculated
Similarity, the second matching classification for now obtaining very meets the historical behavior of user, the demand of the user that can more fit.
Then the first matching classification is ranked up according still further to respective similarity and feeds back to user.It is non-that the application searches out the classification for coming
Normal is comprehensive, and meets the demand of user, may provide the user with the class heading search result of accurate, comprehensive laminating demand.
Secondly, the application can be processed the search keyword in searching request, obtains the search terms of refinement, then
According to each search terms, the first matching classification is obtained after matching to the global classification in global classification storehouse, then based on individual character
Information carries out Secondary Match, obtains second and matches classification.Such that it is able to acquire more accurate matching result, and then search for
Key word and first matching classification the first similarity, and search keyword and second matching classification the second similarity all compare
Relatively accurately, to user feedback accurately, thorough result.
Again, the application can preset screening threshold value, the first matching classification be screened, such that it is able to reduce feedback one
The first low matching classification of a little similarity-rough sets, reduces the waste of resource.
Again, the application can apply in e-commerce field, and it is suitable to recommend when both can search for product for buyer
Classification, or recommend suitable classification during seller's release product, application very comprehensively, user's request of fitting.
Description of the drawings
Fig. 1 is a kind of searching method flow chart based on classification described in the embodiment of the present application;
Fig. 2 is the matching flow chart in global classification storehouse in a kind of searching method based on classification described in the embodiment of the present application;
Fig. 3 is the matching flow chart of individual information in a kind of searching method based on classification described in the embodiment of the present application;
Fig. 4 is the flow chart for sorting in a kind of searching method based on classification described in the application preferred embodiment and feeding back;
Fig. 5 is a kind of searching method schematic diagram based on classification described in the application preferred embodiment;
Fig. 6 is a kind of searcher structure chart based on classification described in the embodiment of the present application;
Fig. 7 is a kind of searcher structure chart based on classification described in the application preferred embodiment.
Specific embodiment
It is understandable to enable the above-mentioned purpose of the application, feature and advantage to become apparent from, it is below in conjunction with the accompanying drawings and concrete real
Apply mode to be described in further detail the application.
With reference to Fig. 1, a kind of searching method flow chart based on classification described in the embodiment of the present application is given.
Step 101, the searching request that user sends in receiving platform;
In some comprehensive websites, the big classification of generally definition just has tens, and the big classification is accurately divided
Afterwards, specific classification may have thousand of.If regarding a website as a platform, the classification defined in the platform may
There are thousand of, when user carries out class heading search in platform, be input into after search keyword in search box, can trigger and send out
Searching request is sent, wherein it is possible to the search keyword is then wrapped as the Transfer Parameters of searching request in the searching request
Search keyword is included, therefore correspondence is received after the searching request of user's transmission, can be with from the Transfer Parameters of searching request
Obtain search keyword.
Search keyword that subsequently can be in searching request, carries out global search and personalized search, wherein, by platform
Defined in classification as global classification, therefore the global search is that all classifications defined in platform are scanned for.And
Individual character classification is that the classification to matching in global search carries out Secondary Match, so that the classification of matching more meets user's
Demand.
Step 102, is matched using the search keyword in global search to the global classification in global classification storehouse,
Obtain first and match classification, and calculate the first similarity of the search keyword and the first matching classification;
It is stored in the classification defined in platform as global classification in global classification storehouse, in global search in the application
In, the global classification in global classification storehouse can be matched using the search keyword, for example, search keyword is
" apple ", then the global classification for coming is matched from global classification storehouse can include cell phone type, panel computer class, notebook electricity
Brain class, fruits and dry fruit etc., using the above-mentioned global classification for matching as the first matching classification.
Then the first similarity of the search keyword and the first matching classification can be calculated, wherein, X and Y similarities
The X possibility degree related to Y is referred to, then what the first similarity referred to can be search keyword and the first matching classification matching
Probability.
A kind of computational methods can be as, calculate the global classification C in the case of given search keyword Q probit P (C |
Q), using probit P (C | Q) as the search keyword and the first similarity of the first matching classification.
Wherein, probability, also known as probability, chance rate, probability or probability, is the basic conception of mathematical probabilities opinion, is one
Real number between 0 to 1, is the tolerance of the probability occurred to random event.Then represent that the probability that an event occurs is big
Little number, is called the probability of the event, then probit is exactly the value of the probability that an event occurs.
Then P (C | Q) what is referred to can be the probability that search keyword Q belongs to global classification C value, namely search keyword
Q and the probability of global classification C matchings, therefore can be using P (C | Q) as first similarity of Q and C.Such as P (C | Q)=30%,
The probability that then illustrating search keyword Q has 30% belongs to global classification C, then illustrate the search keyword and the first matching class
The probability of mesh matching is 0.3, i.e., the first similarity is 0.3.
Step 103, obtains the individual information of user, the described first matching classification is carried out based on the individual information secondary
Matching calculates the second similarity of the second matching classification and search keyword to obtain the second matching classification;
Each user in platform has its event trace in platform, and for example, browsing pages send message etc.,
Event trace in these platforms may be constructed the individual information of user.
In order that Search Results more meet the demand of user, the first above-mentioned matching classification can be carried out based on user
Individual information Secondary Match, and calculate the second similarity of the second matching classification and the search keyword.
As above search keyword " apple " in example, the first matching classification includes cell phone type, panel computer class, notebook
Computer, fruits and dry fruit.If only including electronic product in the individual information of user, entered based on individual information
During row Secondary Match, the second matching classification for getting is cell phone type, panel computer class and notebook computer class.
The second similarity of the search keyword and the second matching classification is further calculated, then second similarity
What is referred to can be the possible degree of search keyword and the second matching classification matching.Equally can calculate in given search keyword
Probit P (C ' | Q) of the second matching classification C ' in the case of Q, that is, determine that search keyword Q belongs to the second matching classification C ''s
Probability, has also determined that the possibility degree of search keyword Q and the second matching classification C ' matching, you can with by the probit
P (C ' | Q) as the search keyword and the second similarity of the second matching classification.
Step 104, according to first similarity and the second similarity, is ranked up and feeds back to the first matching classification.
It is above-mentioned to have got the first matching classification and the first similarity, and the first matching classification is entered by individual information
After row Secondary Match, get to the second matching classification and the second similarity, can be according to similarity to the first matching class
Mesh is ranked up.
Wherein, some first matching classifications are simultaneously comprising the first similarity and the second similarity, and some first matching classes
Mesh only includes the first similarity, therefore, before being ranked up to the global classification for matching, can first calculate each the first matching
Total similarity of classification.Therefore, total similarity of some the first matching classifications is the sum of the first similarity and the second similarity, such as
Cell phone type, panel computer class and notebook computer class in upper example;And total similarity of some the first matching classifications is first
Similarity, as above fruits and dry fruit in example.Then classification is matched by all first, is carried out according to corresponding total similarity
Sequence, and the result after sequence is fed back to into user.
In sum, prior art will successively be searched when classification is searched for according to the descending order of scope, and method is non-
Often loaded down with trivial details, the time of consuming is long, and efficiency comparison is low and access pressure of server is larger.The application can searching from user
Search keyword is obtained in rope request, the matching based on global classification storehouse and secondary based on individual information are then carried out respectively
Match somebody with somebody, user only needs to be input into search keyword, and without the lookup one by one of machinery, relatively time-consuming, efficiency is higher, and reduces
The access pressure of server.Wherein, all of global classification in platform can be matched and is counted by global classification storehouse
The first similarity is calculated, now the first matching classification is very comprehensive;On this basis, Secondary Match is carried out according still further to individual information
When and calculate the second similarity, the second matching classification for now obtaining very meets the historical behavior of user, can more stick on
Share the demand at family.Then the first matching classification is ranked up according still further to respective similarity and feeds back to user.The application is searched
Rope classification out is very comprehensive, and meets the demand of user, may provide the user with accurate, comprehensive, laminating demand
Class heading search result.
Preferably, after the searching request of the receive user, also include:
Search keyword in searching request is processed, at least one in following search terms is obtained:Centre word, list
Word, centre word phrase and holophrastic.
For example, in e-commerce website, can be comprising phrase or word, wherein institute in the search keyword of user input
Stating word can be divided into two classes on semantic understanding, and a class is the product word of clear and definite basic product type, such as car, bus etc.,
This kind of word is referred to as centre word (CenterWord);An other class is to modify other vocabulary of the said goods word, such as blue car
In blue, this kind of word referred to as modifies vocabulary (NormalWord), or is directly referred to as word.
For for phrase (Noun Phrase, NP), can be according to above-mentioned centre word and word(Modification vocabulary), together
Phrase is divided into two classes, the i.e. phrase comprising centre word and is also referred to as prime phrase (CenterNP) by sample, and the phrase comprising word then claims
For holophrastic (NormalNP).
Assume that search keyword Q is obtained after centre word extraction:
Q={NWs,CWs,NPs,CNPs} (1)
Wherein, NWs represents the set of all words, and CWs represents the set of all centre words, and NPs represents that all words are short
The set of language, CNPs represents the set of all prime phrases.
For the text message of all products in website, such as product title can do identical understanding.Can be with the application
Realized to product title or the place of search keyword using centre word extraction tool (Center Word Extractor, CWE)
Reason.In the dependency of predicted query keyword and classification, centre word, word, centre word phrase and holophrastic for matching
The impact of global classification be different, can be weighed by weight in being embodied as.
Preferably, in e-commerce field, the global classification storehouse is that product title is carried out processing to obtain at product
After reason item, calculate what is constituted after the probability of product treatment item described in the corresponding classification of the product title.
The text handling method of above-mentioned discussion is equally applicable to the process to product title, specifically includes:
2.1 text-processing;
It is short using the product treatment item in centre word extraction tool extraction product title, including centre word, word, centre word
Language and common phrase, and root is carried out to the product word process, such as lights and lighting go root to process after when
light。
2.2 probability calculation;
The probability distribution of global classification, that is, determine that the product treatment item belongs to this in the case of the given product treatment item of calculating
The probability of global classification, then with the product treatment item as key word(KEY)Set up the index in global classification storehouse.Pass through
The key word(KEY)Each global classification can be found, and finds the probability that KEY belongs to each global classification.
For example, after text-processing, the global classification (GetInitialCandidate) of the candidate of lookup, Ke Yijian
Sign an agreement, a contract, a receipt, etc. allusion quotation, including the co-occurrence dictionary (coccurIndex) of centre word and centre word periphery word, such as " a b c " is a product
Title, c is centre word, then using centre word periphery word " a b " and " bc " all as KEY " a c " and " b c " can exist respectively
The number of times that each global class occurs now is used as key value(Value)In being added to the index in global classification storehouse:
{“a c”,(cat1,cnt1)(cat2,cnt2).....}
{“b c”,(cat1,cnt1)(cat2,cnt2).....}
The prior probability index of word or phrase in global class now is similar, and correspondence can obtain prior probability rope
Draw(catTokenIndex).
By above-mentioned method, subsequently get after search keyword, can be by the search keyword and co-occurrence dictionary
Matched, it is determined that the KEY and corresponding Value of matching, and the first matching classification is determined according to Value, then subsequently calculated the
One similarity.
Further, model training can also be carried out on the basis of above-mentioned process, so as to obtain each product treatment item
Global weight, it is specific as follows:
2.3 model training;
Product treatment item and global classification in preset search key word, sets up training data set, and each described in labelling
The value of the dependency of individual product treatment item and global classification, then can take RANK-SVM models to learn on the training data,
Obtain the global weight of every kind of product treatment item, i.e. centre word, word, centre word phrase and holophrastic global weight.
Wherein, RANK-SVM models are selected to train global weight, its ultimate principle is to estimate general by SVM model solutions
The parameter of rate P, then the parameter be global weight, the global weight is product treatment item relative to the important of global classification
Degree.Product treatment item is divided into centre word, word, centre word phrase and holophrastic, and for example, centre word is iphone 4s, single
Word is apple, then determine that the number of times for belonging to electronic product is more than by iphone4s and belong to electronic product by apple determinations
Number of times, it is therefore to be understood that for same global classification, determining that product title belongs to the global classification with centre word
Number of times, more than the number of times for belonging to the global classification with word determination.It is for centre word importance phase for global classification
To higher, the importance of word is relatively lower, for example, by calculating centre word, word, centre word phrase and holophrastic
Global weight be respectively 0.4,0.2,0.3,0.1.
Wherein, the SVM(Support Vector Machine, support vector machine)It is a kind of trainable machine learning
Method.In present application example in terms of machine learning method, in addition to using RANK-SVM models, other machines can also be adopted
Device learning algorithm, such as Pranking, Rankboost and other Learning-To-Rank models.
Preferably, in e-commerce field, if user is seller, the individual information includes what the seller issued
Product title, then the individual information of the user is that the product title that the seller issues is carried out processing to obtain seller's processing item
Afterwards, calculate what is constituted after the probability of seller's processing item described in the corresponding classification of the seller.If user is buyer, described
Property information include the product title that browses of the buyer, then the individual information of the user is the product mark browsed to the buyer
Topic is carried out processing and obtained after buyer's processing item, described in the corresponding classification of the calculating product title after the probability of buyer's processing item
Constitute.
When the data to user are processed to obtain individual information, text can be processed.Wherein, if user
For seller, then processed mainly for the product title of seller issue;If user is buyer, buy mainly for described
The product title that family browses is processed.Wherein, seller's processing item and buyer's processing item can include centre word and word, can be with
Do not mark off centre word phrase and holophrastic, and concrete processing method and 2.1 texts in above-mentioned global classification storehouse method for building up
Process basically identical, here is omitted.
In probability calculation, if user is seller, seller's processing item is calculated general in the corresponding classification of the seller
Rate is distributed.Such as the mainly electronic product of seller sale, the corresponding classification of the seller includes mobile phone, mp3/mp4 and electricity
Brain accessory, then carry out processing after acquisition seller's processing item to the product title that the seller issues, and just calculates seller's processing item and exists
Probability distribution in mobile phone, mp3/mp4 and computer fittings.Then the individual information of the seller can be got.
If user is buyer, buyer's processing item buyer's processing item described in the corresponding classification of the product title is calculated
Probability.The product title for such as being browsed according to buyer, gets the corresponding classification of the product title, i.e., what described seller browsed
Classification includes:One-piece dress, women's shoes, suitcase and shirt, then carry out processing at acquisition buyer to the product title that the buyer browses
After reason item, probability distribution of the buyer's processing item in one-piece dress, women's shoes, suitcase and shirt can be calculated, and then obtain institute
State the individual information of buyer.
The method of above-mentioned " 2.2 probability calculation " can be subsequently adopted to calculate probability, and using in above-mentioned " 2.3 model training "
Method, determine the individual character weight of each buyer's processing item and seller's processing item, here is omitted.
With reference to Fig. 2, give global classification storehouse in a kind of searching method based on classification described in the embodiment of the present application
With flow chart.
Global search be to platform in global classification match, concrete grammar includes:
The global classification in global classification storehouse is matched using the search keyword, is obtained first and is matched classification,
And calculate the search keyword and first matching classification the first similarity, including:
Step 201, is respectively adopted each search terms and the global classification in global classification storehouse is matched, and obtains each and searches
The global classification that rope item is matched calculates the probit of Corresponding matching as the first matching classification;
Search terms after search keyword is processed has been got by said method, including centre word, word, centre word be short
Language and holophrastic.Also include product treatment item in global classification storehouse, i.e., the corresponding centre word of described product title, word, in
Heart word phrase and holophrastic, and the probability in the corresponding global classification of the product title.
Therefore, it is possible to each search terms is respectively adopted match to the global classification in global classification storehouse, first is obtained
Matching classification.Then in the global classification storehouse, the probit that each search terms correspondence first matches classification is obtained, that is, is given
The probit of the first matching classification in the case of search terms.As in the centre word in search terms, with global classification storehouse certain
The corresponding centre word matching of global classification, then probit of the centre word in global classification storehouse is in the search terms
The probit of centre word.
Step 202, obtains the global weight of each search terms, according to the global weight of each search terms to respective probit
It is weighted, calculates the first similarity of the search keyword and the first matching classification.
First similarity of the search keyword and the first matching classification, is made up of the probit of each search terms,
Because the significance level relative to the first matching classification of each search terms is different, therefore it is accomplished by by global weight in mark
The significance level of each search terms is noted, i.e., described global weight refers to the probit of each search terms relative to first similarity
Significance level.If search terms is matched with the product treatment item in global classification storehouse in the embodiment of the present application, the product treatment
Global weight can oppose the global weight of the search terms.
The global weight of each search terms can be obtained by above-mentioned processing method, then according to each search terms is complete
Office's weight is weighted to respective probit, is calculated the first similarity of the search keyword and the first matching classification.
A kind of processing method can be in above-mentioned formula(1)On the basis of, calculate first in the case of given search keyword
Probit with classification, concrete formula is as follows:
P (C | Q)=P (C | (NWs, CWs, NPs, CNPs))(2)
It is different with the holophrastic impact in the prediction of global classification in view of centre word, word, centre word phrase, i.e.,
Centre word, word, centre word phrase and the holophrastic global weighted in the global classification.And centre word, list
Word, centre word phrase and it is holophrastic between mutually independent, the new probability formula of above-mentioned condition probability(2)Can be converted into:
P (C | Q)=P (C | (NWs, CWs, NPs, CNPs))
=p (C | NWs)wBWs*p(C|CWs)wCWs*p(C|NPs)wNPs*p(C|CNPs)wCNPs(3)
Wherein:WNWs is the corresponding global weight of word for NWs, and wCWs is the corresponding global weight of centre word for CWs,
WNPs is holophrastic corresponding global weight for NPs, and wCNPs is the corresponding global weight of prime phrase for CNPs.
Wherein, probability distribution of the P (C | NWs) for word in the first matching classification C, therefore its probability Estimation can be launched
For the form of equation below:
P (C | NWs)=P (C | (nw1, nw2...nwk))=P (C | nw1) * P (C | nw2) ... P (C | nwk)(4)
Also, formula(3)In each factor can take such as formula(4)Form launched.
Then the similarity of search keyword Q and the first matching classification C can be according to above-mentioned formula(3)Calculated.And
In order to solve convenience in Practical Calculation, to formula(3)Both sides carry out logarithm log calculating, and concrete formula is as follows:
Log (p (C | Q))=wNWs*log (p (C | NWs))+wCWs*log (p (c | CWs))+wNPs*
log(p(c|NPs))+wCNPs*log(p(c|CNPs))
(5)
Through above-mentioned conversion, the probability calculation formula of search keyword Q and the first matching classification C can be converted to linearly
The Solve problems of model parameter, formula(5)Can also be used as the amount of the similarity between search keyword and the first matching classification
Change the computing formula of index.
In formula(3)In, search keyword Q and first matching classification C probability rely on two data, one be NWs,
The probability distribution of CWs, NPs and CNPs under the first matching classification C, another is the corresponding power of NWs, CWs, NPs and CNPs
Weight, i.e. wNWs, wCWs, wNPs and wCNPs.
In being embodied as, such as in e-commerce platform, probability distribution P of word (C | W) can be by statistics platform
The corresponding word of product title, obtains in all global classes prior distribution now, i.e. frequencies of the word W in global classification C is removed
Estimation of the sum frequency occurred using word W as P (C | W).Wherein, centre word, holophrastic and prime phrase probability distribution
Computational methods are identical, and here is omitted.In addition, for word W can give acquiescence without when occurring in C to it
Value.
With reference to Fig. 3, individual information matching stream in a kind of searching method based on classification described in the embodiment of the present application is given
Cheng Tu.
In addition to scanning for the overall situation, it is also contemplated that the preference of user itself, based on global classification storehouse
Get in matching somebody with somebody on the basis of matching result, rearrangement can be entered to the classification of global search according to user preference, i.e., secondary
Match somebody with somebody.
Generally in e-commerce website, the field that seller is managed is limited and concentrates that is, seller is generally managed
The classification of product be limited, and concentrate in several classifications.Such as " Plastic Mat " divides in seller A and seller B
Not Dui Ying tea cup mat and ground cushion, in above-mentioned global search, seller A will obtain the consistent recommendation results of sequence with B.The application
Further improve on the basis of global search and recommend, realize recommending for the individual character of different sellers, according to seller A and can sell
The difference of the classification of the product that family B is managed, recommends the inconsistent recommendation results of different sellers' sequences.Consider user's itself
Preference, the method matched to individual information is specifically included:
Preferably, the individual information for obtaining user, is carried out based on the individual information to the described first matching classification
Secondary Match calculates the second matching classification and the second similarity of search keyword to obtain the second matching classification, wraps
Include:
Step 301, obtains the individual information of user, and carries out two to the described first matching classification using the individual information
Secondary matching, obtains corresponding second matching classification;
When matching to individual information, the individual information of user can be obtained, then using the individual information to institute
Stating the first matching classification carries out Secondary Match, obtains corresponding second matching classification.
Step 302, obtains the centre word and/or word in search terms, calculates the search terms for obtaining and matches to described second
The probit of classification;
When processing individual information, the centre word or word in search terms can be obtained, or while in acquisition
Heart word and word.Then the probit of the second matching classification can in the case of given search terms, be calculated.
Step 303, obtains the individual character weight of each search terms, according to the individual character weight of each search terms to respective probit
It is weighted, calculates the second similarity of the search keyword and the second matching classification.
Because the impact of the relative second matching classification of different search termses is different in the embodiment of the present application, therefore can adopt
Significance level of the search terms to the second matching classification, computational methods and the overall situation of the individual character weight are marked with individual character weight
Weight is basically identical, and here is omitted.
After the individual character weight for obtaining each search terms, can be according to the individual character weight of each search terms to respective probit
It is weighted, then the probit after weighting is sued for peace, calculates the second similarity of the search keyword and the second matching classification,
Circular is similar with the computational methods in above-mentioned global search, and here is omitted.
The above-mentioned process based on individual information can be considered personalized recommendation, due in the firm-wide that is located in seller, will
Holophrastic and prime phrase is relatively difficult to form statistical significance as the feature of personalized recommendation, therefore mainly considers global search
In Data Representation in user scope of word NWs and centre word CNWs.By taking seller as an example, the distribution limit of word or centre word
It is scheduled in seller's firm-wide, that is, is limited in the range of the corresponding classification of product title that the seller issues.For example, word W
In probability distribution P (C | (W, Comp)) of classification C under company Comp, the number of times under classification C is occurred in company Comp by W
The counted prior probability of total degree occurred in Comp divided by W is represented.
Preferably, after calculating the first similarity of the search keyword and the first matching classification, also include:
According to default screening threshold value, the described first matching classification is screened according to respective first similarity, obtained
Take the first matching classification after screening.
Search keyword Q is assumed after global search, the first matching classification collection for obtaining recommending is combined into { C1,C2,
C3,C4..., the first matching classification is according to formula(5)Fraction arrange from big to small.
For the Search Results for being supplied to user more accurate, in view of the classification belonging to a product is limited, and reduce
The waste of resource, therefore the application has preset screening threshold value, the value of some similarities in the first matching classification set of recommendation
Than relatively low, it is impossible to which the first matching classification for reaching screening threshold requirement will be rejected directly, i.e., only similarity reaches screening threshold
The first matching classification that value is required can just enter personalized recommendation flow process.
It is for instance possible to use whether the first matching classification and the optimum first fraction ratio matched between classification reach to measure
To the requirement of screening threshold value.Wherein, the optimum first matching classification refers to that similarity highest first matches classification.
Such as formula(5)The result for obtaining is negative, therefore matches classification and optimum first when coming below first
Matching classification C1Between multiple exceed screening threshold value T when, it is believed that this first matching classification and afterwards first matching class
All with search keyword Q dependency less, i.e. the first similarity is unable to reach the requirement of screening threshold value to mesh, will not enter follow-up
In personalized search flow process.
Certainly, the screening threshold value can be used for limiting the first matching classification quantity, for example, arrange screening threshold
It is worth for 10, then only obtains the front 10 first matchings classification in the first matching classification set of recommendation.The application is for first
Screening technique with classification is not limited.
With reference to Fig. 4, give in a kind of searching method based on classification described in the application preferred embodiment and sort and feed back
Flow chart.
Preferably, according to first similarity and the second similarity, the first matching classification is ranked up and is fed back to
User, including:
Step 401, obtains the first classification weight for matching classification, according to the classification weight to the first matching classification the
One similarity is weighted.
Because the first matching classification and second matches impact difference of the classification for final Search Results, therefore the application
For the first matching classification is also provided with classification weight, the computational methods of the classification weight and above basic in embodiment
Cause, here is omitted.
Therefore just the first similarity of the first matching classification can be weighted according to the classification weight.So that it is determined that
Go out the similarity of each the first matching classification.
Step 402, detects whether the first matching classification has carried out Secondary Match based on individual information.
Some first matching classifications are only comprising the first similarity in the embodiment of the present application, and some the first matching classification also bases
Secondary Match is carried out in individual information, so that it is determined that the second matching classification, i.e. the first matching classification had both included first phase
Like spending, the second similarity of the second matching classification for also obtaining comprising Secondary Match.Therefore, matching to first according to similarity
Before classification is ranked up, total similarity of each first matching classification is first determined.
Then total similarity of only the comprising the first similarity first matching classification is exactly the first similarity after above-mentioned weighting,
As above total similarity of fruits and dry fruit is exactly the first similarity after its weighting in example.
If so, i.e. described first matching classification has carried out Secondary Match based on individual information, then subsequent execution step 403;
If it is not, i.e. described first matching classification does not carry out Secondary Match, then subsequent execution step 404.
Step 403, by weighting after the first similarity and match the second similar of classification with second that Secondary Match is obtained
Degree summation.
And be directed to based on individual information carried out Secondary Match first matching classification, by weighting after the first similarity and
It is exactly first similar after weighting that second obtained with Secondary Match match the second similarity summation of classification, i.e. its total similarity
The sum of degree and second similarity.As above cell phone type in example, panel computer class and notebook computer class, its total similarity is just
Be weighting after the first similarity and second similarity sum.
Step 403, is ranked up and feeds back to the described first matching classification.
Most at last each the first matching classification is ranked up according to respective total similarity, and by the Search Results after sequence
Feed back to user.
For example, the first matching classification is got in global search for A1, B1, C1 and D1, the value point of the first similarity of correspondence
Not Wei 15,9,8,2, first matching classification classification weight be 1.5.In personalized search obtain the second matching classification for B1 and
D1, the value of its corresponding second similarity is 10 and 5.Then total similarity of final each the first matching classification is respectively:
A1:15*1.5=22.5;
B1:9*1.5+10=23.5;
C1:8*1.5=12;
D1:2*1.5+5=8
First matching classification is ranked up, the Search Results for feeding back to user are B, A, C and D.
With reference to Fig. 5, a kind of searching method schematic diagram based on classification described in the application preferred embodiment is given.
Search keyword Query enters the global classification of candidate after pretreatment (preprocess), i.e. CWE process and looks into
Process is looked for, i.e., global class heading search (catGlobalRec).
It is the corresponding global weight of search terms that each feature will be loaded in global search flow process, while searching prior probability
Index (catTokenIndex) determines the first matching classification, and according to formula(5)It is the first similarity to calculate recommender score
Value.
After global search, the fractional multiple sieve between classification is matched according to the first matching classification after sequence and optimum first
The first matching classification is selected, the first matching classification after screening enters the personalized search stage.
Personalized search can search the information under user's dimension, the i.e. rearrangement on the basis of global recommendation
(catCompanyRerank)。
Subsequently into last handling process (postprocess), mainly recommendation results are done with some letters according to requirement of engineering
Single-filtering, such as limit first matching classification number N of recommendation etc., finally obtains N number of first matching classification and recommends user.
By above-mentioned method, on the first matching classification of first three recommendation, can be with analog subscriber to recommendation first
Matching classification mark is correct(The classification that i.e. user to be searched for is related)And mistake(That is the classification to be searched for of user is uncorrelated), one
In the individual search set (comprising 1000 search keywords) for including 35 big industries, the accuracy rate such as table 1 below of acquisition:
First matching classification ranking (Position) |
Accuracy rate (Precision) |
1 |
95% |
2 |
92% |
3 |
86% |
Table 1
By table 1, the accuracy rate of first first matching classification of recommendation is 95%, second first of recommendation
Accuracy rate with classification is 92%, and the accuracy rate of the 3rd first matching classification of recommendation is 86%.
The application is reduced in e-commerce field, causes the product of seller's issue on platform because recommendation is inaccurate
The misplaced ratio of classification.The global classification of the product that random inspection website is newly issued, the ratio declines 2%.The application is therefore either
Still directly or indirectly affect had lifting by a relatively large margin from it from recommendation precision, and can be all search keywords
Accurately classification recommendation service is provided.
In sum, the application can be processed the search keyword in searching request, obtains the search terms of refinement,
Then according to each search terms, the first matching classification is obtained after matching to the global classification in global classification storehouse, then is based on
Individual information carries out Secondary Match, obtains second and matches classification.Such that it is able to acquire more accurate matching result, and then
First similarity of search keyword and the first matching classification, and the second similarity of search keyword and the second matching classification
It is all relatively more accurate, to user feedback is accurate, comprehensive result.
Secondly, the application can preset screening threshold value, the first matching classification be screened, such that it is able to reduce feedback one
The first low matching classification of a little similarity-rough sets, reduces the waste of resource.
Again, the application can apply in e-commerce field, and it is suitable to recommend when both can search for product for buyer
Classification, or recommend suitable classification during seller's release product, application very comprehensively, user's request of fitting.
It is a kind of searcher structure chart based on classification described in the embodiment of the present application with reference to Fig. 6.
Accordingly, present invention also provides a kind of searcher based on classification, including:Receiver module 11, global search
Module 12, personalized search module 13 and sequence and feedback module 14, wherein:
Receiver module 11, for the searching request that user in receiving platform sends, wherein, the searching request includes searching
Rope key word;
Global search module 12, for the search keyword to be adopted in global search to the global class in global classification storehouse
Mesh is matched, the matching classification of acquisition first, and calculates the first similarity of the search keyword and the first matching classification, its
In, it is stored in the classification defined in platform as global classification in global classification storehouse;
Personalized search module 13, for obtaining the individual information of user, is matched based on the individual information to described first
Classification carries out Secondary Match to obtain the second matching classification, and calculates the second phase of the second matching classification and search keyword
Like degree;
Sequence and feedback module 14, for according to first similarity and the second similarity, entering to the first matching classification
Row sequence simultaneously feeds back to user.
With reference to Fig. 7, a kind of searcher structure chart based on classification described in the application preferred embodiment.
Preferably, present invention also provides a kind of searcher for being preferably based on classification, including:
Receiver module 21, for the searching request that user in receiving platform sends, wherein, the searching request includes searching
Rope key word;
Preferably, described device also includes:
Key word processing module 22, for processing the search keyword in searching request, obtains following search terms
In at least one:Centre word, word, centre word phrase and holophrastic.
Global search module 23, for the search keyword to be adopted in global search to the global class in global classification storehouse
Mesh is matched, the matching classification of acquisition first, and calculates the first similarity of the search keyword and the first matching classification, its
In, it is stored in the classification defined in platform as global classification in global classification storehouse;
Preferably, global search module 23 includes:
Matched sub-block 231, matches for each search terms to be respectively adopted to the global classification in global classification storehouse,
Global classification that each search terms matched is obtained as the first matching classification, and calculates the probit of Corresponding matching;
Calculating sub module 232, for obtaining the global weight of each search terms, according to the global weight pair of each search terms
Each probit is weighted, and calculates the first similarity of the search keyword and the first matching classification.
Preferably, described device also includes:
Screening module 24, for according to default screening threshold value, matching to described first according to respective first similarity
Classification is screened, and obtains the first matching classification after screening.
Personalized search module 25, for obtaining the individual information of user, is matched based on the individual information to described first
Classification carries out Secondary Match to obtain the second matching classification, and calculates the second phase of the second matching classification and search keyword
Like degree;
Preferably, the personalized search module 25 includes:
Acquisition submodule 251, for obtaining the individual information of user, and is matched using the individual information to described first
Classification carries out Secondary Match, obtains corresponding second matching classification;
First calculating sub module 252, for obtaining search terms in centre word and/or word, calculate obtain search terms
Probit to the described second matching classification;
Second calculating sub module 253, for obtaining the individual character weight of each search terms, weighs according to the individual character of each search terms
Respective probit is weighted again, calculates the second similarity of the search keyword and the second matching classification.
Sequence and feedback module 26, for according to first similarity and the second similarity, entering to the first matching classification
Row sequence simultaneously feeds back to user.
Preferably, simultaneously feedback module 26 includes for the sequence:
Weighting submodule 261, for obtaining the first classification weight for matching classification, according to the classification weight to first
The first similarity with classification is weighted;
Summation submodule 262, for the first matching classification for carrying out Secondary Match based on individual information, after weighting
The first similarity match with second that Secondary Match is obtained classification the second similarity summation;
Submodule 263 is sorted and feeds back, for the described first matching classification to be ranked up and fed back.
Preferably, in e-commerce field, the global classification storehouse is that product title is carried out processing to obtain at product
After reason item, calculate what is constituted after the probability of product treatment item described in the corresponding classification of the product title.
Preferably, in e-commerce field, if user is seller, the individual information of the user is to the seller
The product title of issue is carried out processing and obtained after seller's processing item, calculates seller's processing item described in the corresponding classification of the seller
Probability after constitute.
Preferably, in e-commerce field, if user is buyer, the individual information of the user is to the buyer
The product title for browsing is carried out processing and obtained after buyer's processing item, described in the corresponding classification of the calculating product title at buyer
Constitute after the probability of reason item.
For device embodiment, due to itself and embodiment of the method basic simlarity, so description is fairly simple, it is related
Part is illustrated referring to the part of embodiment of the method.
Each embodiment in this specification is described by the way of progressive, what each embodiment was stressed be with
The difference of other embodiment, between each embodiment identical similar part mutually referring to.
Those skilled in the art are it should be appreciated that embodiments herein can be provided as method, system or computer program
Product.Therefore, the application can be using complete hardware embodiment, complete software embodiment or with reference to the reality in terms of software and hardware
Apply the form of example.And, the application can be adopted and wherein include the computer of computer usable program code at one or more
Usable storage medium(Including but not limited to disk memory, CD-ROM, optical memory etc.)The computer program of upper enforcement is produced
The form of product.
Although having been described for the preferred embodiment of the application, those skilled in the art once know basic creation
Property concept, then can make other change and modification to these embodiments.So, claims are intended to be construed to include excellent
Select embodiment and fall into having altered and changing for the application scope.
The application is with reference to method, the equipment according to the embodiment of the present application(System)And the flow process of computer program
Figure and/or block diagram are describing.It should be understood that can be by computer program instructions flowchart and/or each stream in block diagram
The combination of journey and/or square frame and flow chart and/or the flow process in block diagram and/or square frame.These computer programs can be provided
The processor of general purpose computer, special-purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce
A raw machine so that produced for reality by the instruction of computer or the computing device of other programmable data processing devices
The device of the function of specifying in present one flow process of flow chart or one square frame of multiple flow processs and/or block diagram or multiple square frames.
These computer program instructions may be alternatively stored in can guide computer or other programmable data processing devices with spy
In determining the computer-readable memory that mode works so that the instruction being stored in the computer-readable memory is produced to be included referring to
Make the manufacture of device, the command device realize in one flow process of flow chart or one square frame of multiple flow processs and/or block diagram or
The function of specifying in multiple square frames.
These computer program instructions also can be loaded in computer or other programmable data processing devices so that in meter
Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented process, so as in computer or
The instruction performed on other programmable devices is provided for realizing in one flow process of flow chart or multiple flow processs and/or block diagram one
The step of function of specifying in individual square frame or multiple square frames.
Finally, in addition it is also necessary to explanation, herein, such as first and second or the like relational terms be used merely to by
One entity or operation make a distinction with another entity or operation, and not necessarily require or imply these entities or operation
Between there is any this actual relation or order.And, term " including ", "comprising" or its any other variant meaning
Covering including for nonexcludability, so that a series of process, method, commodity or equipment including key elements not only includes that
A little key elements, but also including other key elements being not expressly set out, or also include for this process, method, commodity or
The intrinsic key element of equipment.In the absence of more restrictions, the key element for being limited by sentence "including a ...", does not arrange
Except also there is other identical element in including the process of the key element, method, commodity or equipment.
Above to the searching method based on classification and device provided herein, it is described in detail, herein should
The principle and embodiment of the application are set forth with specific case, the explanation of above example is only intended to help and manages
Solution the present processes and its core concept;Simultaneously for one of ordinary skill in the art, according to the thought of the application,
Will change in specific embodiment and range of application, in sum, this specification content should not be construed as to this Shen
Restriction please.