CN104424296B - Query word sorting technique and device - Google Patents

Query word sorting technique and device Download PDF

Info

Publication number
CN104424296B
CN104424296B CN201310392966.XA CN201310392966A CN104424296B CN 104424296 B CN104424296 B CN 104424296B CN 201310392966 A CN201310392966 A CN 201310392966A CN 104424296 B CN104424296 B CN 104424296B
Authority
CN
China
Prior art keywords
classification
industry
query word
industry classification
leaf node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310392966.XA
Other languages
Chinese (zh)
Other versions
CN104424296A (en
Inventor
丘鹏
史可可
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201310392966.XA priority Critical patent/CN104424296B/en
Publication of CN104424296A publication Critical patent/CN104424296A/en
Priority to HK15105059.0A priority patent/HK1204685A1/en
Application granted granted Critical
Publication of CN104424296B publication Critical patent/CN104424296B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of query word sorting technique and devices, wherein query word sorting technique includes:Obtain the query word of input, generate the corresponding leaf industry classification distribution vector of the query word, wherein, the leaf industry classification distribution vector contains the industry classification of the corresponding at least one leaf node of the query word, the industry-by-industry classification of nodes at different levels corresponds to a classification term vector, and each classification term vector includes multiple class terms;According to the classification term vector of the industry classification of the corresponding each leaf node of the query word, the industry classification of the leaf node belonging to the query word is determined;According to the industry classification of the determining leaf node industry classification where the query word is determined according to pre-set industry classification mapping relations at different levels.Through the invention so that industry query result is more accurate, search efficiency also higher.

Description

Query word sorting technique and device
Technical field
The present invention relates to field of computer technology, more particularly to a kind of query word sorting technique and device.
Background technology
E-commerce is to utilize computer technology, network technology and telecommunication technique, realizes electronization, digitlization and net The entire business procedure of network.With the development of e-commerce, more and more E-commerce Specialty websites are come into being. In E-commerce Specialty website of today, how to carry out industry-specific vertical subdivision is B2B(Business To Business) The focus in market.
However, on the one hand, user out of habit, tends not to initiatively be linked to the industry market page, leads to industry city The field page because customer flow is small, do not get up by business development, can not carry out effectively industry search;On the other hand, existing to use Search engine technique only can return to the corresponding direct result of the query word, without carrying out industry in user input query word It divides, e.g., when user inputs " alcohol ", then directly return and " alcohol " relevant content, without this is looked into according to " alcohol " It askes word and does relevant trade division processing, what it is to cannot be distinguished from out that user wants inquiry is medicinal alcohol, edible alcohol, still Industrial alcohol needs user further distinguish and accurately inquire, causes inquiry inaccurate, search efficiency is low.
As it can be seen that the prior art can not determine that user needs the industry inquired according to the query word of user, lead to not carry out Effectively industry is inquired, and inquiry is not accurate enough, and search efficiency is low.
Invention content
The present invention provides a kind of query word sorting technique and devices, can not be according to the inquiry of user to solve the prior art Word determines that user needs the industry inquired, and leads to not carry out effectively industry and inquires, inquiry is not accurate enough, and search efficiency is low Problem.
To solve the above-mentioned problems, the invention discloses a kind of query word sorting techniques, including:Obtain the inquiry of input Word generates the corresponding leaf industry classification distribution vector of the query word, wherein the leaf industry classification distribution vector includes The industry-by-industry classification of the industry classification of the corresponding at least one leaf node of the query word, nodes at different levels is one corresponding Classification term vector, each classification term vector includes multiple class terms;According to the corresponding each leaf section of the query word The classification term vector of the industry classification of point, determines the industry classification of the leaf node belonging to the query word;According to determining institute The industry classification for stating leaf node, according to pre-set industry classification mapping relations at different levels, where determining the query word Industry classification.
To solve the above-mentioned problems, the invention also discloses a kind of query word sorters, including:First generation module, Query word for obtaining input, generates the corresponding leaf industry classification distribution vector of the query word, wherein the leaf row Industry classification distribution vector contains the industry classification of the corresponding at least one leaf node of the query word, nodes at different levels it is each Industry classification corresponds to a classification term vector, and each classification term vector includes multiple class terms;Determining module is used for According to the classification term vector of the industry classification of the corresponding each leaf node of the query word, the leaf belonging to the query word is determined The industry classification of child node;First sort module, for according to the industry classification of the determining leaf node, according to setting in advance The industry classification mapping relations at different levels set, determine the industry classification where the query word.
Compared with prior art, the present invention has the following advantages:
The present invention by generating corresponding with query word leaf industry classification distribution vector, the leaf industry classification be distributed to Include the industry classification of the corresponding leaf node of one or more query words in amount, industry-by-industry classifications at different levels, which all have, includes The classification term vector of multiple class terms;According to the relationship of query word and the classification term vector of the industry classification of leaf node, first really Determine the industry classification of the accurate leaf node belonging to query word;And then it is closed according to pre-set industry classification mappings at different levels System, to determine the industry classification where query word.Wherein, industry classification mapping relations at different levels are divided into many levels from high to low, The bottom be leaf node industry classification, top be then root node industry classification, the industry classification of leaf node There may be the industry classifications of one or more intermediate level nodes between the industry classification of root node, it is also possible to which there is no in this way Intermediate node industry classification.Accurately determine query word should belonging to leaf node industry classification after, according to this Industry classification mapping relations may be used arbitrary appropriate ways and determine industry classification where query word;In turn, by query word point Corresponding industry classification of the class to determination.By the invention it is possible to determine that user needs the industry inquired according to query word, and then give Go out the information in the sector corresponding to query word for user inquire in accurate industry, realize effectively industry inquiry, And also improve the search efficiency and search experience of user;Meanwhile the row of query word being accurately positioned since leaf node was both Industry classification so that industry query result is more accurate, search efficiency also higher.
Description of the drawings
Fig. 1 is a kind of step flow chart of according to embodiments of the present invention one query word sorting technique;
Fig. 2 is a kind of step flow chart of according to embodiments of the present invention two query word sorting technique;
Fig. 3 is a kind of step flow chart of according to embodiments of the present invention three query word sorting technique;
Fig. 4 is that the flow of a kind of generation query word and industry leaf classification correspondence in embodiment illustrated in fig. 3 is illustrated Figure;
Fig. 5 is a kind of schematic diagram of industry classification mapping relations at different levels in embodiment illustrated in fig. 3;
Fig. 6 is a kind of specific implementation flow chart of embodiment illustrated in fig. 3;
Fig. 7 is a kind of structure diagram of according to embodiments of the present invention four query word sorter;
Fig. 8 is a kind of according to embodiments of the present invention five system tray carrying out industry-specific identification using query word sorter Structure schematic diagram;
Fig. 9 is the industry-specific search pullover schematic diagram of one kind in embodiment illustrated in fig. 8.
Specific implementation mode
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, below in conjunction with the accompanying drawings and specific real Applying mode, the present invention is described in further detail.
Embodiment one
Referring to Fig.1, a kind of step flow chart of according to embodiments of the present invention one query word sorting technique is shown.
The query word sorting technique of the present embodiment includes the following steps:
Step S102:The query word of input is obtained, the corresponding leaf industry classification distribution vector of query word is generated.
Wherein, the corresponding leaf industry classification distribution vector of query word contains the corresponding at least one leaf of the query word The industry classification of node.Industry classification may have a multistage node, and every grade of node may have multiple industry classifications, nodes at different levels it is each A industry classification corresponds to a classification term vector, and each classification term vector includes multiple class terms.
In multistage industry classification, when multistage industry classification is showed with tree structure, the leaf node of the tree structure is The industry classification of the node of lowermost level, this grade of node is the industry classification of leaf node.One query word may correspond to one The industry classification of leaf node, it is also possible to the industry classification of corresponding multiple leaf nodes, the row of the one or more leaf node Industry classification forms the corresponding leaf industry classification distribution vector of query word.
It, can be by those skilled in the art according to practical feelings when determining the industry classification of the corresponding leaf node of query word Condition handles query word input by user and corresponds to corresponding leaf node with reference to existing query word processing mode Industry classification.Certainly, if query word input by user compares specification, the row of corresponding leaf node can directly be corresponded to Industry classification, and query word need not be handled.
Step S104:According to the classification term vector of the industry classification of the corresponding each leaf node of query word, inquiry is determined The industry classification of leaf node belonging to word.
As previously mentioned, a query word may correspond to the industry classification of multiple leaf nodes, query word is accurately corresponded to To one or setting a few leaf node industry classification on, on the one hand increase industry inquiry and classification it is accurate Property, the efficiency of industry inquiry and classification on the other hand can also be improved because of the reduction of subsequent query and grouped data amount.
In the present embodiment, the class term in the classification term vector of query word and the industry classification of each leaf node is carried out Corresponding matching and processing operation, can accurately determine the industry classification for the leaf node that query word should belong to.As basis is looked into Inquiry word determines the industry classification of the leaf node belonging to query word with the matching degree or similarity of classification term vector, alternatively, to class Class term setting weight in mesh term vector determines the leaf belonging to query word by the weight with the matched class term of query word Industry classification of child node etc..
Step S106:According to the industry classification of determining leaf node, mapped according to pre-set industry classifications at different levels Relationship determines the industry classification where query word.
Industry classification may have multistage, there is correspondence between every grade, it is at different levels between correspondence form respectively The mapping relations of grade industry classification.
For the industry classification of each leaf node in leaf industry classification distribution vector, there is corresponding level-one Or multistage higher level's industry classification.It, can be according to the leaf after the industry classification of the leaf node belonging to query word is determined The corresponding higher level's industry classification of industry classification of node, determines the industry classification where query word, and query word institute is determined as worked as The industry classification of the leaf node of category is the corresponding immediate superior industry classification of " one-piece dress " this industry classification after " one-piece dress " For " women's dress ", and higher level's industry classification of " women's dress " this industry classification is " clothes ".Therefore, when query word input by user is When " white dress ", however, it is determined that when the industry classification of the leaf node belonging to it should be " one-piece dress ", then a kind of possible Mode corresponds to the industry classification of root node such as acquiescence, then can determine that the industry classification where " white dress " is " clothes ". Certainly, without being limited thereto, in practical applications, those skilled in the art can also formulate rule appropriate according to actual conditions, with Determine that the industry classification where query word e.g. by setting gender-related rule, determines the row where " white dress " Industry classification is " women's dress " etc..
Through this embodiment, leaf industry classification distribution vector corresponding with query word, the leaf industry classification point are generated Include the industry classification of the corresponding leaf node of one or more query words in cloth vector, industry-by-industry classifications at different levels all have Classification term vector including multiple class terms;According to the relationship of query word and the classification term vector of the industry classification of leaf node, First determine the industry classification of the accurate leaf node belonging to query word;And then it is mapped according to pre-set industry classifications at different levels Relationship, to determine the industry classification where query word.Wherein, industry classification mapping relations at different levels are divided into multiple layers from high to low It is secondary, the bottom be leaf node industry classification, top be then root node industry classification, the industry class of leaf node There may be the industry classifications of one or more intermediate level nodes between mesh and the industry classification of root node, it is also possible to this is not present The industry classification of the intermediate node of sample.Accurately determine query word should belonging to leaf node industry classification after, according to The sector classification mapping relations may be used arbitrary appropriate ways and determine industry classification where query word;In turn, by query word It is categorized into determining corresponding industry classification.Through this embodiment, it can determine that user needs the industry inquired according to query word, into And provide the information in the sector corresponding to query word and for user inquire in accurate industry, it realizes effectively industry and looks into It askes, and also improves the search efficiency and search experience of user;Meanwhile query word being accurately positioned since leaf node was both Industry classification so that industry query result is more accurate, search efficiency also higher.
Embodiment two
With reference to Fig. 2, a kind of step flow chart of according to embodiments of the present invention two query word sorting technique is shown.
The query word sorting technique of the present embodiment includes the following steps:
Step S202:Generate the correspondence of the industry classification of query word and leaf node.
The correspondence of the industry classification of query word and leaf node can generate in the following manner:Obtain searching for user Rope click information, wherein search clicks information and includes:Inquiry word information input by user and user click according to query result Merchandise news;According to search click information and the correspondence of commodity and the affiliated classification of commodity, query word and industry are obtained The correspondence of leaf classification.
In specific implementation, at least one following manner may be used, generate the industry classification of query word and leaf node Correspondence.
Mode one obtains user and searches for click logs, wherein user searches in click logs and contains searching for multiple users Rope click information, the search click information and include:The commodity that inquiry word information, user input by user click according to query result Information and the affiliated category information of commodity;For statistical analysis to the search click information in user's search click logs, acquisition is looked into Ask the corresponding pass between correspondence and merchandise news and the affiliated category information of commodity between word information and merchandise news System;According between the correspondence and merchandise news and the affiliated category information of commodity inquired between word information and merchandise news Correspondence, generate query word and industry leaf classification correspondence.
Mode two obtains the search click information of user, wherein search clicks information and includes:Query word input by user The merchandise news that information and user click according to query result;It is obtained belonging to merchandise news and commodity from merchandise news title Category information determines the correspondence of merchandise news and the affiliated category information of commodity;According to search click information and commodity letter The correspondence of breath and the affiliated category information of commodity obtains the correspondence of query word and industry leaf classification.
It should be noted that the correspondence of the industry classification of query word and leaf node can be by those who practice the invention's reality It is existing, it can also be provided by third party, such as be generated by third party's Core Generator, alternatively, storing data reality by accessing third party It is existing.
Step S204:The query word of input is obtained, according to the correspondence of query word and the industry classification of leaf node, really Determine the industry classification of corresponding, satisfaction setting rule at least one leaf node of query word;Use determining leaf node Industry classification generates the corresponding leaf industry classification distribution vector of query word.
Wherein, setting rule can be appropriately arranged with by those skilled in the art according to actual conditions, with from corresponding to inquiry One or more or whole industry classifications is selected in the industry classification of multiple leaf nodes of word.
As it can be seen that containing the industry class of the corresponding at least one leaf node of query word in leaf industry classification distribution vector Mesh, and for all level nodes and all industry classifications, there are one classes for the industry-by-industry classification correspondence of nodes at different levels Mesh term vector, each classification term vector include multiple class terms.Preferably, each class term in each classification term vector is equal Weight can be set.
Such as, when user's input " Korea Spro's version chiffon skirt ", system is according to the query word of foundation and the industry classification of leaf node Correspondence, corresponded to the row of " one-piece dress ", " inventory's women's dress ", " princess's skirt " and " Japan and Korea S's women's dress " four leaf nodes Industry classification.There are one classification term vectors for each industry classification correspondence, and e.g., " one-piece dress " class has " one-piece dress " " skirt " " snow now Classification term vector of the class terms compositions such as spinning skirt " " Korea Spro's version " corresponding to " one-piece dress ";" inventory's women's dress " class has " inventory female now The class terms composition such as dress ", " women's dress ", " skirt ", " one-piece dress ", " chiffon skirt ", " crock ", " slacks ", " female's suit " corresponds to In the classification term vector of " inventory's women's dress ";" princess's skirt " class has " princess's skirt ", " one-piece dress ", " skirt ", " chiffon skirt " etc. now Classification term vector of the class term composition corresponding to " princess's skirt ";" Japan and Korea S's women's dress " class has " Japan and Korea S's women's dress ", " skirt ", " connects now The class terms such as clothing skirt ", " chiffon skirt " composition corresponds to the classification term vector of " Japan and Korea S's women's dress ", these industry-by-industry classifications are corresponding Classification term vector is established and is generated by big data statistical method.By upper, the row of leaf node corresponding with " Korea Spro's version chiffon skirt " Industry classification distribution vector can be { one-piece dress, inventory's women's dress, princess's skirt, Japan and Korea S's women's dress }.
Preferably, weight can also be arranged in each class term in classification term vector, such as one-piece dress, 0.5;Skirt, 0.3; Chiffon skirt, 0.6;Korea Spro's version, 0.2 } etc..By the way that weight is arranged, industry classification can be made more targeted and tendentiousness, more meet use The common custom in family, so as to quickly determine the corresponding industry classification of query word.
In addition, query word can be user use search engine when, filled in search box show oneself be intended to entry. But it should be recognized that being not limited to the mode of search box input, other input modes, such as corresponding entry, link are clicked, It is included in the range of " input " of the invention.Step S206:According to the industry classification of the corresponding each leaf node of query word Classification term vector determines the industry classification of the leaf node belonging to query word.
A kind of mode is pre-processed to query word, and it is at least one searching keyword to be pre-processed, then, will These searching keywords are matched with the class term in the classification term vector of industry classification, are determined belonging to query word according to matching degree Leaf node industry classification.Such as, it is " Korea Spro's version " and " chiffon skirt " two searching keywords by " Korea Spro version chiffon skirt " pretreatment, It is found after matched, the class term matching degree highest in " Korea Spro's version " and " chiffon skirt " classification term vector corresponding with " one-piece dress ", Will " one-piece dress " be determined as the industry classification of the leaf node belonging to " Korea Spro version chiffon skirt ".
Preferably, when class term has weight, a kind of feasible pattern is:Query word is pre-processed, is generated at least One searching keyword;Matched using at least one searching keyword the class term of the industry classification of each leaf node to Amount;According to the weight of the classification term vector of the industry classification of each leaf node of successful match, each leaf node is determined Industry classification corresponds to the weight of the query word;The industry classification of the highest leaf node of weight is determined as belonging to query word The industry classification of leaf node.Still by taking " Korea Spro's version chiffon skirt " as an example, it is assumed that:Each classification of " one-piece dress " corresponding classification term vector Word weight be one-piece dress, 0.8;Skirt, 0.3;Chiffon skirt, 0.6;Korea Spro's version, 0.2 };The classification term vector of " inventory's women's dress " it is each Class term weight be inventory's women's dress, 0.9;Women's dress, 0.8;Skirt, 0.6;One-piece dress, 0.6;Chiffon skirt, 0.3;Crock, 0.4; Slacks, 0.2;Female is set with, and 0.3 };Each class term weight of the classification term vector of " princess's skirt " be princess's skirt, 0.9;One-piece dress, 0.7;Skirt, 0.5;Chiffon skirt, 0.4 };Each class term weight of the classification term vector of " Japan and Korea S's women's dress " be Japan and Korea S's women's dress, 0.9; Skirt, 0.8;One-piece dress, 0.8;Chiffon skirt, 0.6 }.By taking the addition of simplest weight as an example, it is by " Korea Spro's version chiffon skirt " pretreatment After " Korea Spro's version " and " chiffon skirt ", matched with the classification term vector of the industry classification of each leaf node, result is:" even clothing Skirt " chiffon skirt, 0.6;Korea Spro's version, 0.2 }, " inventory's women's dress " { chiffon skirt, 0.3 }, " princess's skirt " { chiffon skirt, 0.4 }, " Japan and Korea S female Dress " { chiffon skirt, 0.6 }, then after being weighted, the weight that " one-piece dress " corresponds to " Korea Spro's version chiffon skirt " is 0.8, " inventory's women's dress " Weight corresponding to " Korea Spro's version chiffon skirt " is 0.3, and the weight that " princess's skirt " corresponds to " Korea Spro's version chiffon skirt " is 0.4, " Japan and Korea S female It is 0.6 that dress ", which corresponds to the weight of " Korea Spro's version chiffon skirt ",.It is weighted and compares, can be " even clothing by the highest industry classification of weight Skirt " is determined as the industry classification of the leaf node belonging to " Korea Spro's version chiffon skirt ".It should be noted that above only to the processing of weight For simple exemplary illustration, in practical applications, those skilled in the art can be according to actual needs to each classification of acquisition The weight of word carries out various forms of weight processing, with meet demand, the invention is not limited in this regard.
Step S208:According to the industry classification of determining leaf node, mapped according to pre-set industry classifications at different levels Relationship determines the industry classification where query word.
A kind of simplest mode is, upward step by step along the industry classification of the leaf node, imposes a condition until meeting Industry classification.
More preferably mode is, according to pre-set industry classification mapping relations at different levels, to determine leaf node to one kind The corresponding upper level industry classification of industry classification;The corresponding upper level industry classification of industry classification for obtaining leaf node corresponds to The weight of query word;Judge upper level industry classification correspond to query word weight whether more than the second given threshold weight; If so, the upper level industry classification that weight is more than the second given threshold to be determined as to the industry classification where query word;If it is not, Lookup is then continued up, until the industry classification that weight is more than the second given threshold is found, alternatively, until the industry of root node Classification.Wherein, the mode for obtaining the weight that higher level's industry classification corresponds to query word may be used and determine leaf in step S206 The industry classification of child node corresponds to the identical mode of weight of query word;Second given threshold by those skilled in the art according to Actual needs is suitably set, the invention is not limited in this regard.
Yet another approach is, according to the matching degree of query word and the category name of industry-by-industry classification, to carry out absolute value and add Power;According to absolute value weighting as a result, with pre-set industry classification mapping relations at different levels, the industry where query word is determined Classification.
For example, query word input by user is " industrial alcohol ", then related classification " industrial alcohol ", " doctor may be returned With alcohol ", " edible alcohol ".And the feature classified originally is mainly the statistics mapping relations of query word and classification, and inquiry The text relevant of word and classification(The text relevant of query word and classification refers to being filled out first to all commodity users of class now The title write attribute carry out Chinese word segmentation and part-of-speech tagging, statistics obtains mapping of the classification to Chinese word segmentation Term vectors.It is a kind of The computational methods of text relevant are to calculate the Chinese word segmentation vector and the class cosine between Chinese word segmentation vector now of query word Similarity.), without the matching degree in view of query word and category name.Such as:" industrial alcohol " query word is and " industrial wine Essence " classification exactly matches, and " medicinal alcohol ", " edible alcohol " only have part to match, therefore, can be to " industrial alcohol " class Mesh increases more weights, and part matched " medicinal alcohol ", " edible alcohol " increase suitably lack the weights of some.
A kind of formula increasing weights is as follows:
ContextWeight=FactorContext*2*/SameTW(SumQW+SumCW)
Wherein, SameTW indicate the same section of query word and category name participle TermWeight weights and;SumQW tables Show query word participle the total weights of TermWeight and;SumCW indicate category name participle the total weights of TermWeight and; FactorContext indicates the weight of query word and the matched absolute value of category name, which can be according to query word and classification Matching degree is changed by parameter adjustment system optimization algorithm effect process.
Another way is to obtain the Chinese word segmentation markup information of query word;According to pre-set industry classifications at different levels The Chinese word segmentation markup information of mapping relations and query word determines the industry classification where query word according to setting rule.
For example, the system in the present embodiment has used participle tool AliWS and word based on e-commerce Chinese vocabulary bank Property annotation tool TermWeight.AliWS Chinese word segmentations tool can segment the title of query word, commodity, such as query word " chiffon one-piece dress " be segmented into " chiffon/one-piece dress ".TermWeight can be to segmenting part-of-speech tagging, according to a upper example Son, " chiffon(Qualifier)/ one-piece dress(Product word)”.By Chinese word segmentation and part-of-speech tagging, the main of user can be analyzed It is intended to, for example can identifies the product word that query word is carried, main qualifier and region zoning word, product type Etc. information.The merchandise newss such as commodity title, attribute are segmented, user view can be improved to desired quotient with optimization algorithm model The matching of product and precision;TermWeight can assign the significance level of each participle and weigh simultaneously, such as in " chiffon one-piece dress ", " one-piece dress " is product core word, shows the main purpose of user's search commercial articles, even more important, then sets " one-piece dress " weight as 150(This is it is assumed that actual weight size is according to program result changes in demand), " chiffon " weight is 60.
Yet another approach is to calculate the entropy of industry classification;It is reflected according to entropy and pre-set industry classification at different levels Relationship is penetrated, determines the industry classification where query word.
Entropy is the comentropy of industry classifications at different levels, and different query words has different classification coverage areas.Such as:" clothes The entropy of the corresponding industry classification of dress ">The entropy of " men's clothing " corresponding industry classification>The entropy of " in men's style jeans " corresponding industry classification Value.Therefore, the feature of which level can be recognized as query word according to the entropy of each layer industry classification.
In addition, it is necessary to explanation, industry classification mapping relations at different levels can realize by those who practice the invention, can also be by Third party provides, and is such as generated by third party's Core Generator, alternatively, storing data realization by accessing third party.
It, then can will when industry classification meet demand where determining query word when passing through above-mentioned steps S202-208 Industry classification of the query word Direct Classification to determination.And if when being unable to meet demand, such as industry classification weight be less than second The determination of industry classification where following additional projects continue query word then may be used in given threshold.
Step S210:Whether the industry classification where judging determining query word meets established standards, if satisfied, will then look into It askes word and is categorized into determining industry classification, execute step S214;If not satisfied, thening follow the steps S212.
Wherein, established standards are appropriately arranged with by those skilled in the art according to actual conditions, the invention is not limited in this regard.
Preferably, can count whether meet third given threshold by the recalling for industry classification where judging query word, And/or judge whether the hits of the industry classification where query word meet the 4th given threshold, to judge determining query word Whether the industry classification at place meets established standards.Wherein, third given threshold and the 4th given threshold can be by this field skills Art personnel are appropriately arranged with according to actual conditions, the invention is not limited in this regard.
Step S212:The industry classification where query word is determined using the first recognition rule and/or the second recognition rule, is held Row step S214.
Wherein, the first recognition rule is used for the industry classification according to the leaf node in disleaf sub-industry classification distribution vector The industry classification of outer, other leaf nodes, determines the industry classification where query word;Second recognition rule is used for according to inquiry Weight of the word in industry-by-industry classification determines the industry classification where query word.
Preferably, it when determining the industry classification where query word using the first recognition rule, according to query word and can remove The class term of the industry classification of leaf node outside the industry classification of leaf node in leaf industry classification distribution vector, other Vector determines the industry classification where query word.
Still by taking " Korea Spro's version chiffon skirt " as an example, it is assumed that in addition to { one-piece dress, inventory's women's dress, princess's skirt, Japan and Korea S's women's dress }, meanwhile, " Korea Spro version ", the weight of " chiffon skirt " class term are larger in " foreign trade skirt " classification, but because certain reasons not by system elder generation, not In the leaf industry classification vector for appearing in front.When determining final industry classification is unable to meet demand, then can make With the first recognition rule, such as set a recognition threshold, with help " foreign trade skirt " classification query word " Korea Spro's version chiffon skirt " leaf It is recalled in sub-industry classification vector.In turn, with the industry classification that " foreign trade skirt " is determining leaf node, according to pre-set Industry classification mapping relations at different levels, the final industry classification determined where query word.
It preferably, can be according to industry-by-industry class when determining the industry classification where query word using the second recognition rule The corresponding classification term vector of mesh determines query word each step by step from the industry classification of root node to the industry classification of leaf node Whether the weight in a industry classification is greater than or equal to the first given threshold, corresponding above or equal to the weight of given threshold Industry classification is determined as the industry classification where query word.
Still by taking " Korea Spro version chiffon skirt " as an example, when using the second recognition rule, the relationship of industry class word now is first established, than As " Korea Spro's version ", the weight of " chiffon skirt " are big compared with two level industry " men's clothing ", " children's garment " under two level industry classification " women's dress ".Use pattra leaves This grader recognizes the probability of " women's dress ", " men's clothing ", " children's garment " respectively when determining that query word is " Korea Spro's version chiffon skirt ".Setting One recognition threshold so that when the probability of query word and industry classification is more than this threshold value, build one " bridge ", it is believed that query word This industry classification is recognized, when avoiding bottom-up, because clicking biasing leads to the problem of identification mistake.For example, The industry classification of root node is " clothes ", and the industry classification of the two-level node under root node is respectively " women's dress ", " men's clothing " and " child Dress ", using Bayes classifier, when determining that query word is " Korea Spro's version chiffon skirt ", recognize " clothes ", " women's dress ", " men's clothing ", The probability difference 0.6,0.9,0.2,0.3 of " children's garment ".It is 0.8 to set identification difference, then when user's input " Korea Spro edition chiffon skirt ", Searched downwards from root node, search to probability be 0.9 " women's dress " when, it is believed that it meets condition, where " Korea Spro version chiffon skirt " Industry classification is should be " women's dress ".For another example, since the women's dress offer of website is relatively more, the transaction of website, click data are main It focuses on women's dress.For example " short-sleeve T-shirt " needs first to be identified as leaf node and industry class using the process of step S202-208 Mesh " woman style short-sleeve T-shirt ", then by the mapping relations of industry classifications at different levels, could finally be identified as " clothes ".And pass through front To " bridge " of industry classification, " short-sleeve T-shirt " can directly skip leaf node and industry classification, recognize level-one the query word built Industry classification " on clothes ", so as to avoid classification biasing problem.
In addition, when class term is provided with weight, it can also be according to the class term in query word and each classification term vector Matching result, compare the training data that manually marks, the weight of the class term in each classification term vector be adjusted, with Make the weight of class term that can more reflect actual classification results, so that the classification of subsequent query word is more accurate.
Step S214:Template is shown using the industry-specific search pullover of setting or the industry-specific search result of setting, is shown Show the query result of industry class now where query word.
This step is optional step, does not need to display query result in some applications.
But template for displaying inquiry knot is shown by the industry-specific search pullover of setting or the industry-specific search result of setting Fruit also promotes the development of the industry market page while inquiry in the offer accurate industry of user.
Through this embodiment, it determines that user needs the industry inquired according to query word, realizes effectively industry inquiry, and And also improve the search efficiency and search experience of user;Meanwhile the industry of query word being accurately positioned since leaf node was both Classification so that industry query result is more accurate, search efficiency also higher
Embodiment three
With reference to Fig. 3, a kind of step flow chart of according to embodiments of the present invention three query word sorting technique is shown.
The query word sorting technique of the present embodiment includes the following steps:
Step S302:Generate and preserve the correspondence of the industry classification of query word and leaf node.
A kind of mode is that the search click information searched in click logs by user generates and preserves query word and leaf The correspondence of the industry classification of node.Under this mode, is needed in the front end of searched page and get record click behavior ready Setting e.g. can directly increase a parameter in the URL for clicking merchandise news, classification navigation and indicate that this is user's Searching point It hits.Front end searched page can get the user information of the page, inquiry word information, merchandise news and the affiliated classification letter of commodity Breath.Finally, these information are combined into the data of format specification, are stored in data warehouse, click logs are searched for as user. Wherein, format specification can be arranged by those skilled in the art according to the database storage setting adaptability used, so that letter Breath can be in normal storage to data warehouse.Click logs are searched for for the user of mass data, Distributed Calculation can be used System searches for click daily record data to preserve large-scale user, to ensure that calculating task can be timely completed.
Then, click logs are searched for using the user of storage, the OfferId clicked after all users are inquired(Commodity Mark(A kind of information in merchandise news))With the Offer of website(Supply of commodities)Information table(Contain in Offer information tables The displaying leaf classification and publication leaf classification of Offer)Connection can obtain commodity sign instruction by Offer information tables Classification belonging to commodity, so count Offer tables obtain query word to the industry classification of leaf node mapping relations.According to this The mapping relations that kind mode generates are also referred to as the mapping relations generated according to click feature.
Another way is, by the search click information and merchandise news title of user, generate and preserve query word with The correspondence of the industry classification of leaf node.Such as, by taking the Alibaba Website as an example, Alibaba's Chinese word segmentation mark can be passed through Barebone excavates important word and the respective weights such as product, attribute in Offer titles, establishes industry classification to Term(Product, category Property etc.)Mapping relations, and then according to search click information establish query word to industry classification mapping relations.According to this kind of side The mapping relations that formula generates are also referred to as the mapping relations generated according to text relevant feature.
A kind of flow such as Fig. 4 for the industry classification correspondence generating query word and leaf node using above two mode It is shown, including:After the Website front-end page receives the search of multiple users or clicks input, search click data is generated;Then, will The search click data is preserved to data warehouse;Distributed computing system obtains the search click data of user from data warehouse Afterwards, using multiple calculate nodes(Example is 3 in figure)It is handled, query word grouped data is generated, that is, query word and leaf The data of the correspondence of the industry classification of child node;In turn, dictionary index online service is established, in order to subsequently use and look into It looks for.
Step S304:It generates and preserves industry classification mapping relations at different levels.
For the industry classification construction of e-commerce, it can be generated using a tree structure and preserve industry classifications at different levels Mapping relations are referred to as industry classification tree.Bottom classification is referred to as the industry classification of leaf node, from it is upper it is several which layer With regard to what classification, level-one classification is the industry classification of the i.e. root node of top layer classification.A kind of industry classification tree such as Fig. 5 of e-commerce It is shown, as can be seen from Fig. 5, totally three layers of the sector classification tree, including:Top layer classification:" clothes ";Two level classification:" women's dress ", " man Dress " and " children's garment ";Bottom industry classification corresponds to " women's dress ":" one-piece dress, woman style sweater, chiffon skirt " corresponds to " man Dress " is " in men's style jeans, shirts ", and correspond to " children's garment " is " virgin trousers with braces and virgin skirt ".That is, " one-piece dress, female Formula sweater, chiffon skirt " is mapped to " women's dress " upwards, and " in men's style jeans, shirts " are mapped to " men's clothing " upwards, " virgin shoulder strap Trousers and virgin skirt " is mapped to " children's garment " upwards;And " women's dress ", " men's clothing " and " children's garment " is mapped to " clothes " upwards.
Industry classification belonging to commodity is multilayer, this part of data can be safeguarded by the website operation personnel of profession.Industry Classification tree is stable, and addition, deletion and modification category information need unified operational administrative.
Pass through industry classification tree, it may be determined that the correspondence of the industry classification ownership industry of specific commodity.Wherein, for Some fuzzy, controversial industry classification attaching problems, can be with the commodity clicked based on user when classification browses commodity The data of information, statistics determine the industry that classification is belonged to.
In addition, each classification in industry classification tree all has classification term vector, the class term in each classification term vector Weight can be set.
Step S306:Classify to query word.
In the present embodiment, query word classification is embodied as obtaining the classification point of Weight by several features and sorting algorithm The process of cloth.
A kind of implementation process classified to query word is as shown in fig. 6, include:
Step S3062:Input inquiry word.
In the present embodiment, backstage can be called to take by HTTP modes by the WEB page input inquiry word of Website front-end The corresponding function module at business device end, carries out subsequent query word sort operation.Such as, it is called by HTTP modes and is deployed in Arriba Bar Query Process platforms on industry-specific recognizer module realize subsequent query word sort operation.
Step S3064:It determines and the class term in the matched classification term vector of query word.
Step S3066:Generate the corresponding leaf industry classification distribution vector of query word.
In the present embodiment, clicks statistical result and use " query word->The mode of leaf industry classification distribution ", each leaf row The weight of each leaf node industry classification is the sum of class term weight in industry classification distribution vector.
Step S3068:The industry classification for determining the leaf node belonging to query word maps upwards according to industry classification tree.
Step S30610:Determine the industry classification where query word.
It is last for query word it needs to be determined that on that level-one industry classification, two kinds of basic modes may be used:Mode one, On the basis of based on the result marked in query word Chinese word segmentation, determined by setting rule;Mode two passes through the industry class of each layer Mesh weight vectors are input to parameter adjustment module, determine that can query word in current layer confidence.
For example, according to Chinese part-of-speech tagging as a result, the weight of adjusting classification, industry.For example it is contained in " woman style T-shirt " Gender word " woman style ", thus last classification, industry weight can pair " men's clothing " incompatible to query word related classification drop power Weight.
Further, it is also possible in the following ways, such as:
Mode three determines, i.e., according to the click entropy of query word to industry classification:Calculate the corresponding leaf industry of query word The entropy of classification distribution vector, a feature as identification industry classification level;Where query word being determined according to the entropy Industry classification.
Mode four determines, i.e., according to query word and industry category name text associated weight:Leaf row corresponding to query word Industry classification class vector carries out absolute value weighting, increases query word with industry classification in the literal matching degree of name;According to weighting As a result the industry classification where query word is determined.
Mode five, the mode of industry classification, as follows where being determined for low frequency query word:
Low frequency query word refers within the past period(Such as 180 days), click inquiry of such as 20 times less than setting number Word.
When user uses search engine, often uses some and do not search for the inquiry occurred in click logs in user Word, therefore lacked the basis of classification feature of user's click.Also, search for query word of the number of clicks less than setting number(Such as 20 It is secondary), can not complete confidence for simple click classification.For example, " South Korea's Dongdaemun surplus, which is cultivated one's moral character, connects clothing for some user search Skirt " has recorded the user in searching for click logs and clicks 3 Offer for arriving " Bohemia's longuette ", if only using this Click data is as characteristic of division, without being expanded to query word, can reduce many correct classifications and recall.Therefore, it is necessary to Former query word is converted.
It is two kinds of exemplary algorithms converted to query word below:
Algorithm one:KNN Query, i.e. KNN(K-Nearest Neighbor)Word algorithm is inquired, is the classification of K arest neighbors, quilt The K nearest classification neighbours that object of classification is concentrated in training data, for instructing the classification for being classified object.It can pass through User establishes out query word to the row of falling of Term, uses the participle vector of former query word and its to the search click logs of Offer He inquires term vector COS distance and is used as neighbouring categorized data set.Then, the adjacent setting number of comprehensive former query word, such as preceding ten A query word classification results, the classification results as former query word.In turn, it is reflected in industry classification tree according to the classification results It penetrates, determines the industry classification where query word.
Wherein, Term indicates Chinese word segmentation, and the Weight of multiple Term, part of speech are obtained after being segmented query word Vector.Neighbouring categorized data set is a Chinese word segmentation Term to inquiry associated therewith, that user's search click behavior occurred The correspondence of term vector.This is also Weight adjacent to the vector of categorized data set.
Algorithm two:Contracting word converts, and e.g., using two side library TermWeight of natural language processing Chinese word segmentation, gives each Weight is segmented, the participle of weight minimum is removed, judges whether the query word after transformation is high frequency words.If so, after being converted with this Classification results of the query word classification results as former query word;Otherwise continue to remove the participle of next weight minimum, repeat This step is until participle number is 1.In turn, it is mapped in industry classification tree according to the classification results, determines the row where query word Industry classification.
Wherein, the Chinese word segmentation vector of query word is that band assigns weight according to the significance level of participle, when user inputs A query word do not searched for click behavior in the past, then can remove the participle of weight minimum one by one, be combined into new inquiry Word.This goes word process to be continued until that the new query word being combined into hits appearance in user's Searching point(Namely have corresponding Classification vector).Then classification vector of the corresponding classification vector of new query word as this user input query word is used.
It is merely illustrative it should be noted that both the above becomes scaling method, in practical applications, people in the art Member can also use other change scaling methods appropriate according to actual conditions.
Mode six, using natural language processing, e.g., using Alibaba Chinese word segmentation tool AliWS participle and TermWeight attribute labelings assign power.For example, after being segmented to query word, can ensured with the methods of KNN, the transformation of contracting word transformation Under the premise of algorithm effect, data-intensive query word is transformed to, increases the accuracy of query word classification.In turn, the data are determined Industry classification where intensive query word.
In the present embodiment, after materialization by the way of one, determine the industry classification where query word.That is, will inquiry The result of word Chinese word segmentation mark is embodied as one or more grades of higher levels of the industry classification mapping of the corresponding leaf node of query word Industry classification;Whether setting rule is embodied as to have weight more than given threshold in the industry classifications at different levels after judging to map (Also referred to as peak value)Weight industry classification, if so, the sector classification to be then determined as to the industry classification where query word;It is no Then, then the industry classification where query word is determined using other appropriate rules, is such as judged whether that industry classification entropy is less than and is set The industry classification of fixed entropy threshold, alternatively, judging whether that the maximum classification of weight and second classification weight difference are more than rule The industry classification of weight difference threshold value is determined, alternatively, being mapped to root industry classification etc. always.Wherein, above-mentioned each threshold value can be by ability Field technique personnel are appropriately arranged with according to actual conditions.
Wherein, the corresponding classification vector of query word is collated according to weights.Even if second classification weights Two big classifications use the difference of the weights of the weights and second classification of first classification, it can be determined that whether first classification For peak value classification(Difference is sufficiently large).
After one by the way of after materialization, this step in the present embodiment is the industries at different levels for judging query word in mapping It is more than the classification of peak value weight whether to have weight in classification, if so, the classification to be then determined as to the industry classification where query word; If no, using the Else Rule of setting, such as above-mentioned rule for determining industry classification according to entropy threshold or weight difference threshold value, or Person is mapped to always the rule of root industry classification, determines the industry classification where query word.
Step S30612:According to the industry classification where query word, multilayer classification structure is determined, export query result.
The output of query result may be used query word and recognize the distribution of multilayer industry classification and indicate confidence in which The integer of layer industry classification e.g. returns to four layers of classification:The industry classification of leaf node, the industry classification of two-level node, level-one section Industry classification, the BU of point(Business Unit).
On the basis of above-mentioned query word is classified, parameter regulation can also be used, parameter adjustment module, adjusting is such as used to look into Inquiry word classifies each feature may be to the influence of classification results, the effect of optimization algorithm.
Hereinafter, illustrating the realization of parameter regulation.
Parameter regulation needs 3 preconditions:(1)Need the artificial mark query word of distribution similar to actual search engine Grouped data;(2)The automated procedures of sorting parameter configuration item can be corrected automatically;(3)Linear regression problem can be handled Machine learning algorithm module.
Wherein,
About artificial mark query word grouped data, operation personnel can provide the grouped data of profession, so that query word Sampling meet prediction target i.e. real space distribution.It such as, can be secondary to search for by the search traffic log of history Number is used as weight, randomly selects 1000 sample datas, the sampling to meet machine learning will meet real space premise.First lead to It crosses manual type and the classification of each level belonging to each query word is marked into classification, for example, query word " princess's skirt ", from top to bottom It is classified as " consumer goods-clothes-women's dress-one-piece dress ".
According to the business demand that query word is classified, the problem of parameter regulation, can be divided into two classes:The first parameter regulation Problem is the classification distribution for each layer(For example " princess's skirt " can statistically be associated with a Weight class in click logs Mesh distribution vector --- " one-piece dress ", " chiffon skirt ", " lace skirt " etc.), need to allocate multiple linear characters(Linear character refers to Two pairwise uncorrelated between feature)Proportional roles, to reach optimal on algorithm effect.It can be solved using linear regression such Mathematical problem.Second of parameter regulation problem is the recognition threshold CateThre for adjusting each level classification, to determine query word Specifically the range upon range of mountains value classification is recognized.
For the first parameter regulation problem,
Automated procedures can compare the training data manually marked, calculate this ginseng according to query word classification results The scorings that number is adjusted, scoring are attached most importance to the F mean values of accuracy rate(γ values are appropriate i.e. in following formula is more than 1), according to the need of business Ask γ that can be adjusted accordingly(Artificial micro-judgment is biased into recall rate and is also biased into accuracy rate, and generally a kind of business needs It asks middle γ will not change, therefore is not belonging to the process of machine learning in parameter learning).
Wherein, F mean value formulas are as follows:
Each meaning of parameters is as follows in above-mentioned F mean value formulas:
Precision:Indicating that query word is assigned to the accuracy rate of correct classification and level by this system --- classification is correct Sample number/total number of samples;
Recall:Indicate this system classify to sample recall rate --- there are the sample numbers of classification results(Identification To classification or some industry)/ total number of samples
γ:When γ is 1, accuracy rate and recall rate are identical to F mean value weighing factors.γ is bigger, and accuracy rate influences F mean values Weight is bigger.This system focuses on the accuracy rate of classification, therefore γ is a value more than 1.
In calculating process, for a set of feature weight of parameter regulation output, it can obtain out under training sample, The accuracy rate and recall rate of this set feature weight, and that set parameter when finally retaining F mean value maximums.
Below to using linear regression to solve the problems, such as that the first parameter regulation illustrates.
The definition of linear regression is:
Yi01Xi12Xi2+…+βpXip,+εi, i=1 ..., n
To a random sample (Yi,Xi1..., Xip) ,=1 ..., n a, linear regression model (LRM) is assumed to return sub- Yi With regressor Xi1..., XipBetween relationship may be faulty.An error term ε is addedi(It is also a stochastic variable) To capture in addition to Xi1..., XipExcept it is any to YiInfluence.So a multiple linear regression model is expressed as the above shape Formula.The observation for distinguishing stochastic variable and these variables is critically important.Usually, observation or data(With lowercase Table is remembered)Include n value (yi, xi1..., xip), i=1 ..., n.
There is p+1 parameter p0..., βpIt needs to determine, the use of matrix Y=X β+ε tables note is to have very much to estimate these parameters .
Wherein, it includes observation Y that Y, which is one,1..., YnColumn vector, ε includes the random component that do not observe ε1,...,εnAnd the observation matrix X of regressor:
Indicate that X is using matrix:
X generally includes a constant term.
If there is linear correlation between X row, that petty parameter vector β cannot be limited with Least Square Method except non-beta System, for example the sum of some elements for requiring it are 0.
The initial purpose of regression analysis is to estimate the parameter of model to reach the best fit to data.Determining one Among the various criterion of best fit, least square method is very superior.This estimation can be expressed as:
It is engineered by code and realizes a program learnt automatically.Xi is above-mentioned ith feature, and β i are special i-th The weight of sign, Y are the assessment result F mean vectors of a combination parameter(In above formula).
For second of parameter regulation problem,
Second of parameter regulation problem is the recognition threshold CateThre for adjusting each level classification, to determine query word Specifically the range upon range of mountains value classification is recognized.Recognition sequence is bottom-up, first determines whether leaf classification, then arrives two level, level-one, most After be BU.For example it is [" one-piece dress " that query word " princess's skirt ", which has obtained leaf industry classification distribution vector and weight,:0.5, " snow Spin skirt ":0.2, " lace skirt ":0.2, " lace skirt ":0.1], the industry classification threshold value of given identification peak value leaf node LeafCateThre=0.4, therefore industry classification " one-piece dress " has been recognized, it is looked for toward upper level according to industry classification tree, gives peak It is worth two level classification SecondCateThre=0.5, has recognized " women's dress ".
Therefore, each layer of peak value classification threshold value CateThre has a great impact to final algorithm effect.It can make With the method enumerated, best algorithm effect is approached.Rule of thumb, the threshold range of CateThre is in [0.3,0.7] range. Within this range, it is 0.01 to enumerate step-length every time, obtains 40 different CateThre.Classify in the query word of the present embodiment and is In system, possess 4 layers of classification:LeafCate, SecondCate, FirstCate, BUCate.Combination obtain 40*40*40*40= 2560000 peak value classification parameter configurations.Using Distributed Computing Platform, all parameter configuration combinations are soon calculated, The maximum combination of assessment F mean values is chosen, remains on line and uses.
Parameter regulation as described above can obtain one and be sleeved on to optimize in sample space by successive ignition Algorithm parameter configures.
As it can be seen that adjusted by above-mentioned parameter, in disaggregated model there are when multiple parameters, by adjusting the value of parameter so that Model reaches algorithm optimal effectiveness closer to the actual distribution of data.
Step S308:Use the industry-specific search pullover of setting or industry-specific search result displaying template displaying query word point Class result.
When showing query word classification results using industry-specific search pullover, where searched page gets query word After industry category information, the search pullover template of different industries, and the navigation on column of selecting choices from menus in a precise manner are called Classification.For example, user, after searching for one-piece dress, searched page can be jumped to " dress and personal adornment " market research page.And " women's dress " is chosen, leaf classification below is opened, and search result also only will appear " dress and personal adornment-women's dress " class now Merchandise news.
Because pullover appears in the key position in the page, and plays the role of screening search results, therefore industry-specific identification Exigent accuracy rate, to avoid to the bad influence of user experience.
It, can be to search result List exhibitions when showing query word classification results using industry-specific search result displaying template Show mode control, for the search result of different industries, the focus of user is usually different.For example, clothing commodity Search result, user are more likely to browsing pictures.Therefore, it shows search result using big figure mode, has preferably search body It tests.And industrial goods industry user more attribute of interest, price and the place of production, then show association attributes, price and the place of production of product.
Through this embodiment, it realizes in e-commerce, is reflected using query word classification and industry classification tree, classification industry Penetrating keeps search industry-specific;In query word corresponds to multilayer classification, looked into using Chinese word segmentation mark and parameter adjustment module to determine Ask which layer classification word specifically belongs to;It is applied to industry-specific search pullover using trade information, industry-specific search result is shown The applications such as mode.The standard of regulation classification tree and industry mapping format is present embodiments provided, other can be easily extended to Classification tree on.For the different requirements that a variety of classification trees and different business classify to query word, parameter adjustment module is provided The weight for linearly allocating multiple features of vying each other allows machine to learn a set of parameter for being adapted to a variety of requirements automatically, realizes The high reusability of algorithmic code.
Example IV
With reference to Fig. 7, a kind of structure diagram of according to embodiments of the present invention four query word sorter is shown.
The query word sorter of the present embodiment includes:First generation module 402, the query word for obtaining input are raw At the corresponding leaf industry classification distribution vector of query word, wherein leaf industry classification distribution vector contains query word correspondence At least one leaf node industry classification, the industry-by-industry classifications of nodes at different levels corresponds to a classification term vector, each Classification term vector includes multiple class terms;Determining module 404, for the industry according to the corresponding each leaf node of query word The classification term vector of classification determines the industry classification of the leaf node belonging to query word;First sort module 406 is used for basis The industry classification of determining leaf node, according to pre-set industry classification mapping relations at different levels, where determining query word Industry classification.
Preferably, each class term that each classification term vector includes is both provided with weight;The query word of the present embodiment Sorter further includes:Second sort module 408 is used in the first sort module 406 according to the industry of determining leaf node Classification after determining the industry classification where query word, judges to inquire according to pre-set industry classification mapping relations at different levels Whether the industry classification where word meets established standards;If satisfied, query word to be then categorized into determining industry classification;If discontented Foot, then determine the industry classification where query word using the first recognition rule and/or the second recognition rule, wherein the first identification Rule is for the row according to leaf node outside the industry classification of the leaf node in disleaf sub-industry classification distribution vector, other Industry classification determines the industry classification where query word;Second recognition rule is used for according to query word in industry-by-industry classification Weight determines the industry classification where query word.
Preferably, the second sort module 408 is determining query word institute using the first recognition rule and/or the second recognition rule Industry classification when:According to it is outside the industry classification of the leaf node in query word and disleaf sub-industry classification distribution vector, The classification term vector of the industry classification of other leaf nodes determines the industry classification where query word;And/or according to each row The corresponding classification term vector of industry classification determines query word step by step from the industry classification of root node to the industry classification of leaf node Whether the weight in industry-by-industry classification is greater than or equal to the first given threshold, above or equal to the weight pair of given threshold The industry classification answered is determined as the industry classification where query word.
Preferably, determining module 404 includes:Generation module 4042 generates at least one for being pre-processed to query word A searching keyword;Matching module 4044, the industry class for matching each leaf node using at least one searching keyword Purpose classification term vector;Weight module 4046 is used for the class term of the industry classification of each leaf node according to successful match The weight of vector determines that the industry classification of each leaf node corresponds to the weight of query word;Leaf node determining module 4048, Industry classification for the industry classification of the highest leaf node of weight to be determined as to the leaf node belonging to query word.
Preferably, the first sort module 406 includes:Higher level's classification module 4062, for according to pre-set rows at different levels Industry classification mapping relations determine the corresponding upper level industry classification of the industry classification of leaf node;Higher level's weight module 4064 is used Correspond to the weight of query word in the corresponding upper level industry classification of industry classification for obtaining leaf node;Judgment module 4066, For judge upper level industry classification correspond to query word weight whether more than the second given threshold weight;Execution module 4068, it is more than the upper level industry class of the second given threshold by weight if the judging result for judgment module 4066 is yes Mesh is determined as the industry classification where query word;If the judging result of judgment module 4066 is no, lookup is continued up, until The industry classification that weight is more than the second given threshold is found, alternatively, until the industry classification of root node.
Preferably, the query word sorter of the present embodiment further includes:Correspondence generation module 410 is looked into for generating The correspondence of the industry classification of word and leaf node is ask, including:Obtain the search click information of user, wherein search is clicked Information includes:The merchandise news that inquiry word information input by user and user click according to query result;Letter is clicked according to search The correspondence of breath and commodity and the affiliated classification of commodity obtains the correspondence of the industry classification of query word and leaf node.
Preferably, correspondence generation module 410 is when obtaining the search click information of user:It obtains user and searches for click Daily record, wherein user searches for the search click information that multiple users are contained in click logs, and search clicks information and includes:With The merchandise news and the affiliated category information of commodity that the inquiry word information of family input, user click according to query result;Correspondence Generation module 410 is looked into described in acquisition in the correspondence according to described search click information and commodity and the affiliated classification of commodity When asking the correspondence of word and the industry classification of the leaf node:To user search for click logs in search click information into Row statistical analysis, the correspondence and merchandise news between acquisition inquiry word information and merchandise news and the affiliated classification of commodity Correspondence between information;According to the correspondence and merchandise news and commodity inquired between word information and merchandise news Correspondence between affiliated category information generates the correspondence of the industry classification of query word and leaf node.
Preferably, correspondence generation module 410 is according to search click information and commodity and the affiliated classification of commodity Correspondence, when obtaining the correspondence of the industry classification of query word and leaf node:Commodity are obtained from merchandise news title Information and the affiliated category information of commodity, determine the correspondence of merchandise news and the affiliated category information of commodity;It is clicked according to search The correspondence of information and merchandise news and the affiliated category information of commodity obtains the industry classification of query word and leaf node Correspondence.
Preferably, the first generation module 402, the query word for obtaining input, according to the row of query word and leaf node The correspondence of industry classification determines that query word is corresponding, meets the industry classification of the regular at least one leaf node of setting;Make With the industry classification of determining leaf node, the corresponding leaf industry classification distribution vector of query word is generated.
Preferably, the query word sorter of the present embodiment further includes:Module 412 is adjusted, for according to query word and respectively The matching result of class term in a classification term vector compares the training data manually marked, in each classification term vector The weight of class term is adjusted.
Preferably, the second sort module 408 is when whether the industry classification where judging query word meets established standards:Sentence Recalling for industry classification where disconnected query word counts whether meet third given threshold, and/or, judge the industry where query word Whether the hits of classification meet the 4th given threshold.
The query word sorter of the present embodiment is for realizing corresponding query word classification in aforesaid plurality of embodiment of the method Method, and the advantageous effect with corresponding embodiment of the method, details are not described herein.
Embodiment five
With reference to Fig. 8, show that a kind of according to embodiments of the present invention five use query word sorter carries out industry-specific knowledge Other system architecture schematic diagram.
The system architecture that industry-specific identification is carried out using query word sorter of the present embodiment includes mainly::Webpage is searched Suo Qianduan SearchWeb, query word analysis platform Query Process and industry-specific Web page module.
Wherein, it is provided with industry-specific identification module in query word analysis platform Query Process(With in example IV Query word sorter institute it is functional), industry-specific identification is made to be serviced on platform for calling as a module.It is industry-specific The industry-specific webpage that Web page module provides(Industry-specific search result including industry-specific search pullover and/or setting shows template) Different with traditional e-commerce webpage, searched page exhibition method is more professional, and industry requirement also can quickly be rung It answers.
User is searched for, inquiry word information is obtained by SearchWeb first, then calls Query Process The industry-specific identification module of platform.Industry-specific identification module can calculate relevant according to previously mentioned query word grouped data Industry classification where industry classification and the affiliated industry of query word namely query word, is then returned to SearchWeb.Later, SearchWeb can be using the industry-specific data of query word as a search parameter, calling search engine and industry-specific Web page module.It searches Index is held up can complete a series of application according to the demand in downstream with industry-specific Web page module using industry-specific parameter.
For example, industry-specific webpage can draw the industry-specific search pullover page.As shown in Figure 9.At this point, SearchWeb is obtained To after trade information, the search pullover template of different industries, and the navigation type on column of selecting choices from menus in a precise manner are called Mesh.For example, in Fig. 9, if after user searches for one-piece dress, SearchWeb can be jumped to " dress and personal adornment " market research page. Also, " women's dress " is chosen, leaf classification below is opened, and search result also only will appear " dress and personal adornment-women's dress " class Now merchandise news.Because pullover appears in the key position in the page, and plays the role of screening search results, because of the industry Change and identify exigent accuracy rate, to avoid to the bad influence of user experience.
For another example, industry-specific webpage can draw search result List exhibition methods displaying query result.For different industries Search result, the focus of user is usually different.For example, clothing commercial articles searching as a result, user be more likely to it is clear It lookes at picture, therefore search result is shown using big figure mode, have better search experience.And industrial goods industry user more pays close attention to Attribute, price and the place of production, then the association attributes of display of commodity, price and place of production information, have better search experience.
Through this embodiment, the industry-specific identification to query word input by user is realized, can be determined according to query word User needs the industry inquired, and then provides the information in the sector corresponding to query word and for user look into accurate industry It askes, improves the search experience of user;Meanwhile this mode can also effectively facilitate the development of the industry market page.
Scheme through the invention, can be when a user searches for, before the trade information recognized is supplied to webpage End and search engine sequence end, to determine the exhibition method of merchandise news list page, the navigation of search page top, and as search One feature of engine sequence.Classified by the query word inputted in search box to user, directs the user to corresponding industry Market.A kind of scheme provided in the embodiment of the present invention can click the query word hierarchical classification of diagram data based on search daily record Structure, and design a kind of algorithm, corresponding leaf industry classification can be calculated according to click data to all query words Then distribution vector uses e-commerce website industry classification structured data, query word is divided into correct industry market.And And the present invention also provides regulation classification tree and the standard of industry mapping format, it can easily be extended to other classification trees On.For the different requirements that a variety of classification trees and different business classify to query word, provides parameter adjustment module and linearly allocate The weight of multiple features of vying each other allows machine to learn a set of parameter for being adapted to a variety of requirements automatically, realizes algorithmic code High reusability.
Through the invention, it realizes in e-commerce, uses query word classification and classification tree, classification industry Mapping implementation It searches for industry-specific;In query word corresponds to multilayer classification, query word is determined using Chinese word segmentation mark and parameter adjustment module Which layer classification specifically belonged to;It is applied to industry-specific search pullover, industry-specific search result exhibition method using trade information Deng application.But not limited to this, skilled person can also according to embodiments of the present invention in description, using corresponding equivalent or Alternative realizes that the solution of the present invention e.g. can use the query word of other low accuracys rate to classify;Without using industry classification Data and the mapping of classification industry are set, but its exhibition method is determined in the applications such as the industry page to each leaf industry classification Deng;Use the return classification etc. of search classification navigation.
Wherein, query word classification is Query Classification, i.e. the algorithm basis of this system, is computer major Term indicates to obtain query word to identified classification(This system is the classification of e-commerce)Process.Assign to each class simultaneously It is Weight on not(It can be understood as probability)'s.
Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with The difference of other embodiment, the same or similar parts between the embodiments can be referred to each other.For device and system For embodiment, since it is basically similar to the method embodiment, so description is fairly simple, related place is implemented referring to method The part explanation of example.
The embodiment of the present invention can be used in numerous general or special purpose computing system environments or configuration.Such as:Individual calculus Machine, server computer, handheld device or portable device, multicomputer system, based on microprocessor are at laptop device System, network PC, minicomputer, mainframe computer include the distributed computing environment etc. of any of the above system or equipment.
The embodiment of the present invention can describe in the general context of computer-executable instructions executed by a computer, example Such as program module.Usually, program module include routines performing specific tasks or implementing specific abstract data types, program, Object, component, data structure etc..The embodiment of the present invention can also be put into practice in a distributed computing environment, in these distributions In computing environment, by executing task by the connected remote processing devices of communication network.In a distributed computing environment, Program module can be located in the local and remote computer storage media including storage device.In a typical configuration In, the computer equipment includes one or more processors (CPU), input/output interface, network interface and memory.Memory May include the volatile memory in computer-readable medium, random access memory (RAM) and/or Nonvolatile memory Etc. forms, such as read-only memory (ROM) or flash memory (flash RAM).Memory is the example of computer-readable medium.Computer can It includes that permanent and non-permanent, removable and non-removable media can be accomplished by any method or technique information to read medium Storage.Information can be computer-readable instruction, data structure, the module of program or other data.The storage medium of computer Example include, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), dynamic random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), fast flash memory bank or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM), digital versatile disc (DVD) or other optical storages, magnetic tape cassette, tape magnetic disk storage or other magnetic storage apparatus or any other non-biography Defeated medium can be used for storage and can be accessed by a computing device information.As defined in this article, computer-readable medium does not wrap With including non-standing computer readable media (transitory media), such as data-signal and carrier wave of modulation.
A kind of query word sorting technique provided by the present invention and device are described in detail above, it is used herein Principle and implementation of the present invention are described for specific case, and the explanation of above example is only intended to help to understand The method and its core concept of the present invention;Meanwhile for those of ordinary skill in the art, according to the thought of the present invention, having There will be changes in body embodiment and application range, in conclusion the content of the present specification should not be construed as to the present invention Limitation.

Claims (18)

1. a kind of query word sorting technique, which is characterized in that including:
The query word for obtaining input, generates the corresponding leaf industry classification distribution vector of the query word, wherein the leaf row Industry classification distribution vector contains the industry classification of the corresponding at least one leaf node of the query word, nodes at different levels it is each Industry classification corresponds to a classification term vector, and each classification term vector includes multiple class terms;
According to the classification term vector of the industry classification of the corresponding each leaf node of the query word, determine belonging to the query word Leaf node industry classification;
It is determined according to pre-set industry classification mapping relations at different levels according to the industry classification of the determining leaf node Industry classification where the query word;
Whether the industry classification where judging the query word meets established standards;
If satisfied, the query word to be then categorized into the determining industry classification;If not satisfied, then using the first recognition rule And/or second recognition rule determine the industry classification where the query word, wherein first recognition rule is used for according to removing The industry classification of leaf node outside the industry classification of leaf node in the leaf industry classification distribution vector, other, really Industry classification where the fixed query word;Second recognition rule is used for each class for including in each classification term vector When mesh word is both provided with weight, according to weight of the query word in industry-by-industry classification, where determining the query word Industry classification.
2. according to the method described in claim 1, it is characterized in that, described advised using the first recognition rule and/or the second identification Then determine the query word where industry classification the step of include:
It is according to the query word and in addition to the industry classification of the leaf node in the leaf industry classification distribution vector, its The classification term vector of the industry classification of its leaf node determines the industry classification where the query word;
And/or
According to the corresponding classification term vector of industry-by-industry classification, from the industry classification of root node to the industry classification of leaf node, It determines whether weight of the query word in industry-by-industry classification is greater than or equal to the first given threshold step by step, will be greater than or wait Industry classification where the corresponding industry classification of weight of the given threshold is determined as the query word.
3. method according to claim 1 or 2, which is characterized in that described according to the corresponding each leaf of the query word The step of classification term vector of the industry classification of node, the industry classification for determining the leaf node belonging to the query word includes:
The query word is pre-processed, at least one searching keyword is generated;
The classification term vector of the industry classification of each leaf node is matched using at least one searching keyword;
According to the weight of the classification term vector of the industry classification of each leaf node of successful match, each leaf is determined The industry classification of child node corresponds to the weight of the query word;
The industry classification of the highest leaf node of weight is determined as to the industry class of the leaf node belonging to the query word Mesh.
4. according to the method described in claim 3, it is characterized in that, the industry class according to the determining leaf node Mesh, according to pre-set industry classification mapping relations at different levels, the step of determining the industry classification where the query word, includes:
According to the pre-set industry classification mapping relations at different levels, determine that the industry classification of the leaf node is corresponding on Level-one industry classification;
The corresponding upper level industry classification of industry classification for obtaining the leaf node corresponds to the weight of the query word;
Judge the upper level industry classification correspond to the query word weight whether more than the second given threshold weight;
If so, being more than that the upper level industry classification of second given threshold is determined as the row where the query word by weight Industry classification;If it is not, lookup is then continued up, until the industry classification that weight is more than second given threshold is found, alternatively, Until the industry classification of root node.
5. according to the method described in claim 1, it is characterized in that, the query word and the industry classification of the leaf node Correspondence generates in the following manner:
Obtain the search click information of user, wherein described search click information includes:Inquiry word information input by user and The merchandise news that user clicks according to query result;
According to described search click information and the correspondence of commodity and the affiliated classification of commodity, the query word and institute are obtained State the correspondence of the industry classification of leaf node.
6. according to the method described in claim 5, it is characterized in that,
It is described obtain user search click information the step of include:It obtains user and searches for click logs, wherein the user searches The search click information of multiple users is contained in rope click logs, described search click information includes:It is input by user The merchandise news and the affiliated category information of commodity that inquiry word information, user click according to query result;
It is described according to described search click information and the correspondence of commodity and the affiliated classification of commodity, obtain the query word Include with the step of correspondence of the industry classification of the leaf node:Searching point in click logs is searched for the user It hits that information is for statistical analysis, obtains the correspondence between the inquiry word information and the merchandise news and the quotient Correspondence between product information and the affiliated category information of the commodity;According to the inquiry word information and the merchandise news it Between correspondence and the merchandise news and the affiliated category information of the commodity between correspondence, look into described in generation Ask the correspondence of word and the industry classification of the leaf node.
7. according to the method described in claim 5, it is characterized in that, it is described according to described search click information and commodity with The correspondence of the affiliated classification of commodity, the step of obtaining the correspondence of the industry classification of the query word and the leaf node Including:
Merchandise news and the affiliated category information of commodity are obtained from merchandise news title, determine the merchandise news and the commodity The correspondence of affiliated category information;According to described search click information and the merchandise news and the affiliated class of the commodity The correspondence of mesh information obtains the correspondence of the query word and the industry classification of the leaf node.
8. according to claim 5 to 7 any one of them method, which is characterized in that the query word for obtaining input generates institute The step of stating query word corresponding leaf industry classification distribution vector include:
The query word for obtaining input determines institute according to the correspondence of the query word and the industry classification of the leaf node State the industry classification of corresponding, satisfaction setting rule at least one leaf node of query word;
Using the industry classification of the determining leaf node, generate the corresponding leaf industry classification of the query word be distributed to Amount.
9. according to the method described in claim 1, it is characterized in that, further including:
According to the matching result of the class term in the query word and each classification term vector, the training number manually marked is compareed According to being adjusted to the weight of the class term in each classification term vector.
10. according to the method described in claim 1, it is characterized in that, the industry classification where the judgement query word is No the step of meeting established standards includes:
Judge the industry classification where the query word recalls whether number meets third given threshold, and/or, it is looked into described in judgement Whether the hits for asking the industry classification where word meet the 4th given threshold.
11. a kind of query word sorter, which is characterized in that including:
First generation module, for obtains input query word, generate the corresponding leaf industry classification of the query word be distributed to Amount, wherein the leaf industry classification distribution vector contains the industry of the corresponding at least one leaf node of the query word The industry-by-industry classification of classification, nodes at different levels corresponds to a classification term vector, and each classification term vector includes multiple Class term;
Determining module is used for the classification term vector of the industry classification according to the corresponding each leaf node of the query word, determines The industry classification of leaf node belonging to the query word;
First sort module, for the industry classification according to the determining leaf node, according to pre-set industries at different levels Classification mapping relations determine the industry classification where the query word;
Second sort module, in first sort module according to the industry classification of the determining leaf node, according to Pre-set industry classification mapping relations at different levels judge the inquiry after determining the industry classification where the query word Whether the industry classification where word meets established standards;If satisfied, the query word to be then categorized into the determining industry class Mesh;If not satisfied, the industry classification where the query word is then determined using the first recognition rule and/or the second recognition rule, Wherein, first recognition rule is used for according to the industry classification except the leaf node in the leaf industry classification distribution vector The industry classification of outer, other leaf nodes, determines the industry classification where the query word;Second recognition rule is used for When each class term that each classification term vector includes is both provided with weight, according to the query word in industry-by-industry classification In weight, determine the industry classification where the query word.
12. according to the devices described in claim 11, which is characterized in that second sort module is using the first recognition rule And/or second recognition rule when determining the industry classification where the query word:
It is according to the query word and in addition to the industry classification of the leaf node in the leaf industry classification distribution vector, its The classification term vector of the industry classification of its leaf node determines the industry classification where the query word;
And/or
According to the corresponding classification term vector of industry-by-industry classification, from the industry classification of root node to the industry classification of leaf node, It determines whether weight of the query word in industry-by-industry classification is greater than or equal to the first given threshold step by step, will be greater than or wait Industry classification where the corresponding industry classification of weight of the given threshold is determined as the query word.
13. device according to claim 11 or 12, which is characterized in that the determining module includes:
Generation module generates at least one searching keyword for being pre-processed to the query word;
Matching module, the class of the industry classification for matching each leaf node using at least one searching keyword Mesh term vector;
Weight module is used for the weight of the classification term vector of the industry classification of each leaf node according to successful match, Determine that the industry classification of each leaf node corresponds to the weight of the query word;
Leaf node determining module, for the industry classification of the highest leaf node of weight to be determined as the query word institute The industry classification of the leaf node of category.
14. device according to claim 13, which is characterized in that first sort module includes:
Higher level's classification module, for according to the pre-set industry classification mapping relations at different levels, determining the leaf node The corresponding upper level industry classification of industry classification;
Higher level's weight module, the corresponding upper level industry classification of industry classification for obtaining the leaf node correspond to described The weight of query word;
Judgment module, for judging that whether the upper level industry classification corresponds to the weight of the query word more than the second setting The weight of threshold value;
Weight is more than second given threshold if the judging result for the judgment module is yes by execution module Upper level industry classification is determined as the industry classification where the query word;If the judging result of the judgment module is no, Lookup is continued up, until the industry classification that weight is more than second given threshold is found, alternatively, until the row of root node Industry classification.
15. according to the devices described in claim 11, which is characterized in that further include:
Correspondence generation module, the correspondence for generating the query word and the industry classification of the leaf node, packet It includes:Obtain the search click information of user, wherein described search click information includes:Inquiry word information input by user and use The merchandise news that family is clicked according to query result;According to described search click information and pair of commodity and the affiliated classification of commodity It should be related to, obtain the correspondence of the query word and the industry classification of the leaf node.
16. device according to claim 15, which is characterized in that
The correspondence generation module is when obtaining the search click information of user:It obtains user and searches for click logs, wherein The user searches for the search click information that multiple users are contained in click logs, and described search click information includes: The merchandise news and the affiliated category information of commodity that inquiry word information input by user, user click according to query result;It is described right It answers relation generation module in the correspondence according to described search click information and commodity and the affiliated classification of commodity, obtains institute When stating the correspondence of query word and the industry classification of the leaf node:Searching point in click logs is searched for the user It hits that information is for statistical analysis, obtains the correspondence between the inquiry word information and the merchandise news and the quotient Correspondence between product information and the affiliated category information of the commodity;According to the inquiry word information and the merchandise news it Between correspondence and the merchandise news and the affiliated category information of the commodity between correspondence, look into described in generation Ask the correspondence of word and the industry classification of the leaf node.
17. device according to claim 15, which is characterized in that the correspondence generation module is according to described search The correspondence of click information and commodity and the affiliated classification of commodity, obtains the industry of the query word and the leaf node When the correspondence of classification:
Merchandise news and the affiliated category information of commodity are obtained from merchandise news title, determine the merchandise news and the commodity The correspondence of affiliated category information;According to described search click information and the merchandise news and the affiliated class of the commodity The correspondence of mesh information obtains the correspondence of the query word and the industry classification of the leaf node.
18. according to the devices described in claim 11, which is characterized in that second sort module is judging the query word institute Industry classification whether meet established standards when:Judge the industry classification where the query word recalls whether number meets the Three given thresholds, and/or, judge whether the hits of the industry classification where the query word meet the 4th given threshold.
CN201310392966.XA 2013-09-02 2013-09-02 Query word sorting technique and device Active CN104424296B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201310392966.XA CN104424296B (en) 2013-09-02 2013-09-02 Query word sorting technique and device
HK15105059.0A HK1204685A1 (en) 2013-09-02 2015-05-28 Method for classifying query word and device thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310392966.XA CN104424296B (en) 2013-09-02 2013-09-02 Query word sorting technique and device

Publications (2)

Publication Number Publication Date
CN104424296A CN104424296A (en) 2015-03-18
CN104424296B true CN104424296B (en) 2018-07-31

Family

ID=52973275

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310392966.XA Active CN104424296B (en) 2013-09-02 2013-09-02 Query word sorting technique and device

Country Status (2)

Country Link
CN (1) CN104424296B (en)
HK (1) HK1204685A1 (en)

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105468680A (en) * 2015-11-16 2016-04-06 中国建设银行股份有限公司 Data retrieval method and device
CN105894019A (en) * 2016-03-30 2016-08-24 北京京东尚科信息技术有限公司 Database data classification method and apparatus
US9798820B1 (en) 2016-10-28 2017-10-24 Searchmetrics Gmbh Classification of keywords
US9767182B1 (en) 2016-10-28 2017-09-19 Searchmetrics Gmbh Classification of search queries
CN108206020A (en) * 2016-12-16 2018-06-26 北京智能管家科技有限公司 A kind of audio recognition method, device and terminal device
CN106971000B (en) * 2017-04-12 2020-04-28 北京焦点新干线信息技术有限公司 Searching method and device
CN107423362B (en) * 2017-06-20 2021-06-08 创新先进技术有限公司 Industry determining method, object obtaining method and device, client and server
CN108388555A (en) * 2018-02-01 2018-08-10 口碑(上海)信息技术有限公司 Commodity De-weight method based on category of employment and device
CN110322093B (en) * 2018-03-30 2024-03-08 阿里巴巴集团控股有限公司 Information processing method, information display method, information processing device and computing equipment
CN110019822B (en) * 2019-04-16 2021-07-06 中国科学技术大学 Few-sample relation classification method and system
CN110580291B (en) * 2019-07-29 2021-12-21 用友网络科技股份有限公司 Intelligent searching method based on ERP customer service knowledge graph and computer equipment
CN111191022B (en) * 2019-12-27 2023-07-25 苏宁云计算有限公司 Commodity short header generation method and device
CN111190933B (en) * 2019-12-27 2022-08-12 江苏艾佳家居用品有限公司 E-commerce category deep query method based on interval number
CN111159552A (en) * 2019-12-30 2020-05-15 北京每日优鲜电子商务有限公司 Commodity searching method, commodity searching device, server and storage medium
CN111339297B (en) * 2020-02-21 2023-04-25 广州天懋信息系统股份有限公司 Network asset anomaly detection method, system, medium and equipment
CN113553820A (en) * 2020-04-24 2021-10-26 腾讯科技(深圳)有限公司 Information processing method, equipment and computer readable storage medium
CN112241493A (en) * 2020-10-28 2021-01-19 浙江集享电子商务有限公司 Commodity retrieval method and device, computer equipment and storage medium
CN112447300B (en) * 2020-11-27 2024-02-09 平安科技(深圳)有限公司 Medical query method and device based on graph neural network, computer equipment and storage medium
CN114860892B (en) * 2022-07-06 2022-09-06 腾讯科技(深圳)有限公司 Hierarchical category prediction method, device, equipment and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101639831A (en) * 2008-07-29 2010-02-03 华为技术有限公司 Search method, search device and search system
CN102053983A (en) * 2009-11-02 2011-05-11 阿里巴巴集团控股有限公司 Method, system and device for querying vertical search
CN102841946A (en) * 2012-08-24 2012-12-26 北京国政通科技有限公司 Commodity data retrieval sequencing and commodity recommendation method and system
CN103020066A (en) * 2011-09-21 2013-04-03 北京百度网讯科技有限公司 Method and device for recognizing search demand

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6754666B1 (en) * 1999-08-19 2004-06-22 A2I, Inc. Efficient storage and access in a database management system
US7953723B1 (en) * 2004-10-06 2011-05-31 Shopzilla, Inc. Federation for parallel searching

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101639831A (en) * 2008-07-29 2010-02-03 华为技术有限公司 Search method, search device and search system
CN102053983A (en) * 2009-11-02 2011-05-11 阿里巴巴集团控股有限公司 Method, system and device for querying vertical search
CN103020066A (en) * 2011-09-21 2013-04-03 北京百度网讯科技有限公司 Method and device for recognizing search demand
CN102841946A (en) * 2012-08-24 2012-12-26 北京国政通科技有限公司 Commodity data retrieval sequencing and commodity recommendation method and system

Also Published As

Publication number Publication date
CN104424296A (en) 2015-03-18
HK1204685A1 (en) 2015-11-27

Similar Documents

Publication Publication Date Title
CN104424296B (en) Query word sorting technique and device
CN104199822B (en) It is a kind of to identify the method and system for searching for corresponding demand classification
CN107748754B (en) Knowledge graph perfecting method and device
CN103914478B (en) Webpage training method and system, webpage Forecasting Methodology and system
CN102279851B (en) Intelligent navigation method, device and system
CN103020164B (en) Semantic search method based on multi-semantic analysis and personalized sequencing
US8190556B2 (en) Intellegent data search engine
US8880498B2 (en) System and method for aggregating and ranking data from a plurality of web sites
CN110059271B (en) Searching method and device applying tag knowledge network
US20160124996A1 (en) Image ranking based on attribute correlation
CN109146626B (en) Fashionable garment matching recommendation method based on user dynamic interest analysis
CN106339502A (en) Modeling recommendation method based on user behavior data fragmentation cluster
CN110532479A (en) A kind of information recommendation method, device and equipment
CN104978356B (en) A kind of recognition methods of synonym and device
CN103970850B (en) Site information recommends method and system
CN106951422A (en) The method and apparatus of webpage training, the method and apparatus of search intention identification
TWI695277B (en) Automatic website data collection method
CN107315738A (en) A kind of innovation degree appraisal procedure of text message
CN110134792A (en) Text recognition method, device, electronic equipment and storage medium
CN108197180A (en) A kind of method of the editable image of clothing retrieval of clothes attribute
CN110580489A (en) Data object classification system, method and equipment
CN107203558A (en) Object recommendation method and apparatus, recommendation information treating method and apparatus
CN107833088A (en) Content providing, device and smart machine
Sharma et al. A study of tree based machine learning techniques for restaurant reviews
CN106776910A (en) The display methods and device of a kind of Search Results

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1204685

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant