CN102508831B - Improve shopping search engine - Google Patents

Improve shopping search engine Download PDF

Info

Publication number
CN102508831B
CN102508831B CN201110117329.2A CN201110117329A CN102508831B CN 102508831 B CN102508831 B CN 102508831B CN 201110117329 A CN201110117329 A CN 201110117329A CN 102508831 B CN102508831 B CN 102508831B
Authority
CN
China
Prior art keywords
document
inquiry
grading
computer
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201110117329.2A
Other languages
Chinese (zh)
Other versions
CN102508831A (en
Inventor
S·P·坎杜利
M·D·巴洛斯
M·帕拉欣
C·郁
Q·吴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Technology Licensing LLC
Original Assignee
Microsoft Technology Licensing LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US12/757,095 external-priority patent/US8700592B2/en
Application filed by Microsoft Technology Licensing LLC filed Critical Microsoft Technology Licensing LLC
Publication of CN102508831A publication Critical patent/CN102508831A/en
Application granted granted Critical
Publication of CN102508831B publication Critical patent/CN102508831B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses the method and system improving shopping search engine.Web search system uses the mankind for the dependency of the result that the inquiry of various sample searches returns is carried out ranking.Search Results can be divided into each group, thus allows to be trained ranked result and confirm.Consistent to mankind's assessment instructs the consistent results allowed across the many individuals performing this ranking.After the machine learning classification instruments such as such as MART are the most programmed and confirm, it is possible to use it provides the absolute ranking of the dependency of the document through returning rather than simple relative rankings based on such as Keywords matching and click-through count.When exploitation relevant refinement such as such as classification and price sequence etc., the document considering that there is relatively low relevance rank can be got rid of.

Description

Improve shopping search engine
Technical field
The present invention relates to networking technology area, particularly relate to the search inquiry in network technology.
Background technology
Too much the results list of any given inquiry of user is left in use to search engine for.Some system attempts based on example Document through returning is ranked up by the hits such as the word in title or from prior searches by relative rank.Search in shopping In the case of rope, continuous item can be presented based on the document through returning, such as classification or price etc..Due to the document through returning Quality may inconsistent, therefore continuous item potentially includes unexpected result.Such as, to word on popular search engine The shopping search of " Flos Rosae Rugosae " such as can return from audio frequency CD to game console at the document, and does not the most all have in front 10 results Present the document of flower.The scope of the shopping classification presented can be from earrings to book of time.
When to special characteristic sequences such as such as prices, give excessively promoting of this feature and this characteristic can be caused than another Characteristic is preponderated, and to completely lose dependency as cost.Such as, to can to the request of " GPS " search results ranking by price To cause the most expensive support for carry GPS to be first illustrated, and this thing that the most certainly not user is just finding.
Summary of the invention
The sort result system of a kind of higher level uses machine learning techniques and the mankind to judge to determine parameter, for base In user expect to use the absolute relevance value of Search Results result is ranked up rather than is based solely on hits and/ Or title word matched carries out relative order to the document through returning.Additionally, the Query Result using absolute ranker can To align in classification more accurately, thus allow the more preferable suggestion to like product or complementary products.
Absolute ranker can use the result of representative inquiry to provide the lists of documents of this inquiry.Mankind judge can Ranking is carried out, to provide for catching the result of mankind's generation subsequently to be applied to the sample of the result to each inquiry The knowledge base being programmed of the machine learning classification instrument of new inquiry.
Absolute ranker allows to screen the result through returning in advance so that do not give unrelated results by the sequence of feature excessive Promote.
Accompanying drawing explanation
Fig. 1 is the block diagram of exemplary computer device;
Fig. 2 is the diagram of exemplary Internet search environment;
Fig. 3 A is to illustrate the flow chart that machine learning classification instrument is trained;
Fig. 3 B is the flow chart being shown with machine learning classification instrument to develop Search Results;
Fig. 4 is the diagram of the part illustrating example decision tree;And
Fig. 5 shows the computer screen sectional drawing illustrating Search Results element.
Detailed description of the invention
Although the detailed description of numerous different embodiment having been set forth below, but it is to be understood that the scope of law of this description Limited by the words of claims appended hereto.This detailed description should be construed as exemplary only, and does not retouches State each possible embodiment, even if because describe each possible embodiment be not impossible be also unpractical.Can Using prior art or the technology of exploitation is to realize numerous alternative embodiment after the application submits day to, and this still falls within right Within the scope of claim.
Be also to be understood that in this patent, unless used sentence, " as used herein, term ' _ _ _ _ _ ' is defined as hereby Mean ... " or similar sentence define a term clearly, be the most no matter clearly or implicitly, do not limit This explicans beyond it usually or the intention of common meaning, and, this term be not construed as being limited in based on In the scope of any statement done (in addition to the language of claims) in any part of this patent.With regard to this patent institute For any term described in attached claims is quoted in this patent in the way of consistent with odd number meaning, this Do for simplicity's sake and so, it is only for do not make reader feel to obscure, and this kind of claim terms is not intended as Impliedly or otherwise it is limited to this odd number meaning.Finally, unless a claim elements is by narration word " device " Do not describe what any structure defined with function, the scope of the most any claim elements be not intended as based on 35U.S.C. § 112, the 6th section should be used for explain.
Many invention functions and many invention principles most preferably use or utilize software program or instruction and such as The integrated circuits such as application-specific integrated circuit (IC) realize.Although expect those of ordinary skill in the art may carry out substantial amounts of work and The many design alternatives actuated by such as pot life, prior art and economic problems, but when by disclosed herein Still be able to during the guide of concept and principle easily with minimum experiment to generate these software instructions and program and IC.Cause This, in order to simple and clear and minimize any risk making principles and concepts according to the present invention obscure, to these softwares and IC (as If fruit has) discussed further be limited to necessary to the principle for preferred embodiment and concept those and discuss.
With reference to Fig. 1, include computer 110 shape for realizing the exemplary computer device of method and apparatus required for protection The universal computing device of formula.Assembly shown in dotted outline is technically not a part for computer 110, but is used for The exemplary embodiment of Fig. 1 is shown.The assembly of computer 110 may include but be not limited to, processor 120, system storage 130, Memorizer/graphic interface 121 (also referred to as north bridge chips) and I/O interface 122 (also referred to as South Bridge chip).System stores Device 130 and graphic process unit 190 are alternatively coupled to memorizer/graphic interface 121.Monitor 191 or other graphical output devices It is alternatively coupled to graphic process unit 190.
A series of system bus can couple various system component, these system bus include processor 120, memorizer/ High speed system bus 123 between graphic interface 121 and I/O interface 122, memorizer/graphic interface 121 and system storage Front Side Bus 124 between 130, and the advanced figure process between memorizer/graphic interface 121 and graphic process unit 190 (AGP) bus 125.System bus 123 can be any one in the bus structures of several type, including, as example Unrestricted, these architectures include industry standard architecture (ISA) bus, MCA (MCA) bus and increasing Strong type ISA (EISA) bus.Evolution along with system architecture, it is possible to use other bus architectures and chipset, but Generally substantially follow this pattern.Such as, the such as company such as Intel and AMD supports Intel central body architecture (Intel respectively Hub Architecture, IHA) and super transmission TM (Hypertransport) architecture.
Computer 110 generally includes various computer-readable medium.Computer-readable medium can be can be by computer 110 Any usable medium accessed, and comprise volatibility and non-volatile media, removable and irremovable medium.As example Rather than limitation, computer-readable medium can include computer-readable storage medium.Computer-readable storage medium includes for storage all As any method of the information such as computer-readable instruction, data structure, program module or other data or technology realize volatile Property and non-volatile, removable and irremovable medium.Computer-readable storage medium include, but not limited to RAM, ROM, EEPROM, Flash memory or other memory technology, CD-ROM, digital versatile disc (DVD) or other optical disc storage, cartridge, tape, disk Storage or other magnetic storage apparatus, maybe can be used for other medium any of storing information needed and can being accessed by computer 110.
System storage 130 includes volatibility and/or the computer-readable storage medium of nonvolatile memory form, as read-only Memorizer (ROM) 131 and random access memory (RAM) 132.System ROM 131 can comprise permanent system data 143, such as Mark and manufacturing information.In certain embodiments, basic input/output (BIOS) may be alternatively stored in system ROM 131. RAM 132 generally comprises data and/or the program module that processor 120 can immediately access and/or be currently in operation.As Example and unrestricted, Fig. 1 shows operating system 134, application program 135, other program modules 136, and routine data 137.
System bus 123 can be coupled to the multiple of computer 110 with by various inside and outside equipment by I/O interface 122 Other buses 126,127 and 128 couple.Serial peripheral interface (SPI) bus 126 may be connected to comprise help in the such as starting period Between transmit basic input/output (BIOS) memorizer of basic routine of information between elements within computer 110 133。
Super input/output chip 160 can be used for being connected to multiple ' traditional ' ancillary equipment, the most such as, diskette 1 52, Keyboard/mouse 162 and printer 196.In certain embodiments, super I/O chip 160 can use such as low pin count (LPC) The buses such as bus 127 are connected to I/O interface 122.Each embodiment of super I/O chip 160 can be purchased widely in commercial market Buy.
In one embodiment, bus 128 can be peripheral parts interconnected (PCI) bus or its modification, can be used for more Ancillary equipment at a high speed is connected to I/O interface 122.Pci bus is referred to alternatively as interlayer (Mezzanine) bus.The change of pci bus Type includes fast peripheral component connection (PCI-E) and extension peripheral parts interconnected (PCI-X) bus, the former there is serial line interface and The latter is back compatible parallel interface.In other embodiments, bus 128 can be Serial Advanced Technology Attachment (ATA) bus Or the ata bus of Parallel ATA (PATA) form (SATA).
Computer 110 can also include other removable/nonremovable, volatile/nonvolatile computer storage media. Being only used as example, Fig. 1 shows the hard disk drive 140 being written and read irremovable, non-volatile magnetic media.Hard drive Device 140 can be conventional hard disc drive, maybe can be analogous to the storage medium described below with reference to Fig. 2.
Such as USB (universal serial bus) (USB) memorizer 153, live wire (IEEE 1394) or CD/DVD driver 156 etc. can Move media can be connected to pci bus 128 directly or by interface 150.It is similar to the storage medium described below with reference to Fig. 2 154 can be coupled by interface 150.Other that can use in Illustrative Operating Environment are removable/nonremovable, volatile Property/nonvolatile computer storage media includes but not limited to, cartridge, flash card, digital versatile disc, digital video tape, Solid-state RAM, solid-state ROM etc..
As discussed above and in FIG shown driver and their Computer Storage being associated are situated between Matter, provides the storage of computer-readable instruction, data structure, program module and other data for computer 110.Such as, In FIG, hard disk drive 140 is illustrated as storing operating system 144, application program 145, other program module 146 and program Data 147.Noting, these assemblies can be with operating system 134, application program 135, other program module 136 and routine datas 137 is identical, it is also possible to different from them.To operating system 144, application program 145, other program module 146 and program number Providing different numberings according to 147, with explanation, at least they are different copies.User can be by such as mouse/keyboard 162 etc. Order and information are inputted computer 20 by input equipment or the combination of other input equipments.Other input equipment (not shown) are permissible Including microphone, stick, game paddle, satellite dish, scanner etc..These and other input equipments generally pass through One in the such as I/O interface bus such as SPI 126, LPC 127 or PCI 128 is connected to processor 120, but can use Other buses.In certain embodiments, other equipment can via super I/O chip 160 be coupled to parallel port, infrared interface, Game port and the like (is not described).
Computer 110 can use via network interface controller (NIC) 170 to one or more remote computers, as far The logic of journey computer 180 connects to come operation in networked environment.Remote computer 180 can be personal computer, server, Router, network PC, peer device or other common network node, and generally include above with respect to computer 110 institute The many stated or whole element.Logic between NIC 170 and remote computer 180 that Fig. 1 is described connects can include local Net (LAN), wide area network (WAN) or both, but may also include other networks.Such networked environment is common in office, enterprise In wide computer network, Intranet and the Internet.Remote computer 180 also may indicate that support is handed over computer 110 The web server of session mutually.
In certain embodiments, network interface can be unavailable in broadband connection or use modulation when not using broadband connection Demodulator (is not described).It is exemplary for being appreciated that shown network connects, and foundation between the computers can be used logical Other means of letter link.
Fig. 2 is the block diagram 200 of Web search system 200.Client computers 202 may be coupled to web server 206. The traffic between web server 206 and client computers 202 can carry on the networks 204 such as such as the Internet.web Search inquiry can be directed to search engine 208 by server 206.Search engine 208 can return the knots such as such as lists of documents Really, and by those results one or more classification tool servers such as such as server 210 and 212 it are sent to.Such as content The Additional servers such as server 214 and identity server 216 can support other functions.Classification tool programmed environment 218 can To include classification tool exploitation server 220, classification tool data base 222 and to can be used for supporting that mankind judge is compiling The multiple work stations 224,226,228 to the ranking returning result are performed during the journey stage.Various servers and work station are permissible It is similar to the illustrative computer 110 of Fig. 1.Although each server is shown as execution special function by the description of Fig. 2, but can make Combine or divide the function being associated with described exemplary servers with the combination of hardware and software.
In operation, web server 206 can receive Internet search query, such as sells relevant inquiry, such as, Relevant to product for sale or service.Search engine 208 can perform the search corresponding to selling relevant inquiry, and can return Return multiple response documents.Each response document can have subsidiary text and describe and/or photo.Classification tool server 210, 212 or both can use weight tree search come for multiple response documents each exploitation absolute relevance ranking.At one In embodiment, weight tree search based on MART tree algorithm, but can use numerous other machines learning classification tools production. Classification tool server 210,212 or both can be each return document return absolute relevance ranking.An enforcement In example, absolute relevance ranking may be at from the scope of 0 to 1.Example thresholds level can be 0.97, but can set Put, the most such as any amount of threshold level be dynamically set based on the multiple documents returned by search.Can be exhausted by it The order of relevance rank is presented to user the document of the absolute relevance ranking receiving more than threshold level.
Content server 214 and identity server 216 can develop the relevant refinement presenting Search Results, such as document Features and characteristics.
Content server 214 can check the response document with the absolute relevance ranking more than threshold level, and And determine that such as classification, brand, price etc. are about the feature of each document.Due to compared with relative ranker, absolute relevance Ranking provides mate closer with the Expected Response of user, therefore determined by about the feature (such as classification) of each document Narrower and categorical attribute more accurately can be given.In order to be ordered for classification presenting to user, can be to specific point The absolute relevance ranking of each document of apoplexy due to endogenous wind is averaged so that the classification with the highest grand average is present in top.
Identity server 216 can be from being selected as the multiple responses with the absolute relevance ranking more than threshold level Document extracts content, to develop the characteristic list of document.Such as, characteristic can include price, user's grading, expert's grading Deng, as above in regard to described in content server 214, identity server 216 can be only to having been determined as having at threshold Those documents of absolute relevance ranking more than value level operate.As a result, it is possible to such as press price to hope to document The user of sequence presents the item more meeting initial search, and otherwise initial search may be only with the relative row used in prior art Name realizes.
Classification tool programmed environment 218 can be used to come classification tool server 210,212 or both and machine thereof Learning procedure is trained, confirms and tests.Can select use in programming phases to look into from search engine logs Ask, to provide real world to assess target.Inquiry and extraction or " wiping (scrape) off " result can be run to collect document For assessment.The sampling to result can be used.Such as, in one embodiment, first 20 is the knot from relative ranker Really, and other 80 documents randomly choose from document 21 to 250.Can be by inquiry and each inquiry chosen Result be stored in classification tool data base 222 for classification tool exploitation server 220 on use.Exploitation server 220 can present to the people in work station 224,226,228 by inquiring about and being chosen for each in result Class judge.Each result can be graded by mankind judge by the expectation of this inquiry relative to him or she.Grading or label It is the best, good, general or poor to be rated simply.Such as, mankind judge believe do not have more preferable other " the best " label can be used during result.Good result can be the result that user may find, but may have more preferable result. It not but that mankind judge finds can provide " typically " label during relevant result at it.In the document returned and inquiry " poor " label can be distributed time unrelated.In one embodiment, label is converted into numeral grading 1-4, and wherein 1 is difference and 4 For the best.In another embodiment, can switch labels exponentially, wherein 1 be given 1,2 are given 4, and 3 are given 9, with And 4 be given 16.Using of index creates than distance bigger between getting well and being general between the best and good.
Human label data is used as an element in training.In one embodiment, can by inquiry, document, The label (weighting or do not weight) of mankind's distribution and such as other characteristics such as title match and ' clicking on ' are together with external data one Rise and combine.Clicking on is the measurement that actually clicked how many times by user of the document to the return as result.In training process Used in other external datas can include but not limited to:
NumberOfPerfectMatches_FeedsPhrase (Perfect Matchings quantity _ feed phrase) is defined as (word must be by not having other words between same order and they for the phrase quantity mated completely with inquiry.) note, nothing Word (that is, as common word such as ' the (this (that)) ' and ' of (it ()) ') is removed, therefore for as ' Lord ofthe The inquiry such as Dance (dance of king) ' there will be no Perfect Matchings).
WordsInAccessoryListFeature (word in accessories list characteristic) and the stationary array of key word Table coupling and the word typically found in adnexa.This is characteristic flux matched with the word number in inquiry in the list.
(many examples total normalizer feed is short for MultiInstanceTotalNormalizer_FeedsPhrase Language) MultiInstanceTotalNormalizer_stream (many examples total normalizer stream) is each word normalizing Change the summation of device, but remove repetition.The value of characteristic is 10.0.If there is duplicate keys, then as repetition every of first preceding paragraph One will have the value of the MultiInstanceNormalizer_stream of the value equal to its father. MultiInstanceTotalNormalizer_stream can not be to repeat count.
CategoryFeature (sort feature) this be the spy carrying out the classification of inquiry with the classification of document mating Property.
FirstOccurenceOfNearTuples_FeedsTerm (feed item occurs in the first time of neighbouring tuple) The skew that query term occurs in stream for the first time.For anchor, first there is being defined as the inclined of the beginning of first anchor phrase Move.The minimum query length of this characteristic is 1.Default value is (document terminate-document start+1) (DocumentEnd- DocumentStart+1) zero rather than before.
The length of StreamLength_FeedsPhrase (stream length _ feed phrase) classification stream
NumberOfTruePerfectMatches_FeedsMulti (the quantity feed of true Perfect Matchings is multiple) point Hit prediction-prediction document and obtain the model of the probability clicked on
The measurement of unrelated popularity inquired about by document by StaticRank (static rank).Across inquiry to document The summation clicked on.These clicks can decay exponentially, so that higher weights to give click more recently.
Altogether can be merged in exploitation and training machine learning classification instrument such as 300 as many external data elements In.
Fig. 3 A is to illustrate the flow chart 300 that machine learning classification instrument is trained.Training process relates to inquiry and correspondence Result be supplied to the outcome quality to given inquiry to carry out the mankind judge of subjectively ranking.
At frame 302, query set can be generated for training machine learning classification instrument.Can be from taking from actual user searches The inquiry of the search engine logs of inquiry selects this query set.
At frame 304, this query set can be performed on internet search engine, to develop each inquiry in this query set Corresponding result set.
At frame 306, the document of limited quantity can be selected from the result set of each correspondence.An exemplary embodiment In, relative ranker can be applied to each result set.Can select front 20 documents of being specified by relative ranker and Other 80 documents selected from the document that the ranking specified by relative ranker is 21-250.In this embodiment, subsequently 100 documents can be submitted to for assessing each inquiry.
At frame 308, can be that the subjectivity that each the corresponding inquiry of exploitation in the document of limited quantity is compared is graded.Many Individual judge can each receive lists of documents and inquiry, and applies subjective grading.In one embodiment, can be at 4 points (four-point) these gradings are performed on the basis of.Subjective grading can be only by poor, general, good and perfect grading Distribute to each document.Grading can be converted into numerical value.Such as, each document can be respectively allocated numerical value 1-4 or be added Power so that grading is converted into numerical value 1,4,9 and 16 respectively.Compared with arriving well general grading, the use of the grading of weighting helps increasing Add the distance between perfection and good grading.
At frame 310, the subjective grading of each in the document of limited quantity can be used at least in part to come machine Learning classification instrument is programmed.As discussed above, additional external data element can be merged in exploitation and training machine In learning classification instrument.Although other similar means are known and perform similarly, but in one embodiment, engineering Practising classification tool can be multiple accumulative regression tree (MART) instrument.
At frame 312, for assisting in ensuring that the consistent results between mankind judge, trial can be developed based on theme grading Concordance rate between Yuan.For example, it is possible to compare the grading of the selected quantity of identical document, and can grade with counting statistics deviation.
At frame 314, if below inter judge agreement rate falls and limit one, then mankind judge can be alerted, such as, can The grading criterion added to give mankind judge realizes more consistent result with help.For example, it is possible to comment relative to " preferably " Level preferably defines the criterion that can be considered " being correlated with ".
Fig. 3 B is to be shown with machine learning classification instrument to the flow chart 350 developing in Search Results.
At frame 352, the inquiry collected that returns to document can be performed.This inquiry can be by the search engine 208 etc. of such as Fig. 2 The actual live query that the user of search engine submits to.
At frame 354, at least some of for processing further of the document sets through returning can be selected.For example, it is possible to make There is provided the senior selection of document for considering further with the relative ranker used in such as prior art.An enforcement In example, across multiple computers, document sets can be divided, and a relative ranker can be used on each computer, Thereby the top results from the relative rankings on each computer is returned for processing further.In another embodiment, Document sets can be processed on a single computer, and the top results from this relative rankings can be used.For example, it is possible to will The 10-30% of the total document through returning is supplied to absolute ranker described below.
At frame 356, can be that each document in this part of the set of return provides absolute relevance score.Can make With being comprised in classification tool server 210,212 or both machine learning classification instruments generate absolute relevance score. Absolute relevance score can be label and the function of external data of all mankind as described above generation.
At frame 360, it is possible to use the absolute relevance score of each document of this part of the document through returning creates Document subset.Each document in this subset can have absolute relevance grading (i.e. mark) more than threshold value.
At frame 362, can optionally document subset to be ranked up according to its absolute relevance score.Regardless of whether it is first First the document subset is ranked up, can select based on the file characteristics in the document subset one or more relevant thin Change.Select one or more relevant refinement can include selecting a characteristic and/or a feature.Characteristic can include user grade, Price, expert's grading etc..Feature can include classification, Price Range and brand.
At frame 364, can start to present data to user.Data are presented the computer that can be included in the request of making Upper display is correlated with the one or more of refinement, and can include presenting tabulation.Can be by using in specific classification The average absolute relevance values of document and present classification by the order of the highest meansigma methods and develop the sequence to classification.
At frame 366, can the absolute relevance score of each document based on document subset, the highest relevant by inquiry The order of property shows document subset.
Optionally, at frame 358, initially presenting period or asking in response to user in data, can be to absolute phase Closing property mark is adjusted.Such as, if a user indicate that to preference sorting by price, then price characteristic can be given additionally Importance, i.e. be referred to as promote process.Giving the additional importance such as to a characteristic, machine learning classification instrument is permissible Again weighted, or alternatively, the machine learning classification instrument weighted in advance can be selected.Can come based on boosted feature At least one of each document for document sets regenerates absolute relevance score.Can also use subsequently through regenerating Absolute relevance score re-create document subset.The step being associated selecting relevant refinement and display document can To be merely re-executed.
Fig. 4 shows exemplary tree search 400.Node 402,404,406,408 and 410 can be each with specific The decision point that characteristic is associated.If there is this characteristic, then with apportioning cost 1, and can take the branch on the left side.If There is not this characteristic, then with apportioning cost 0, and can take the branch on the right.During training, can be to each node It is weighted adjusting the decision point of each node.In multiple training run, thus it is possible to vary weighting is to determine which value provides Optimum performance.Can also adjust such as to have in tree and many deeply could end other criterions such as (cut off) search and more connect to be given It is bordering on the result of the result of mankind judge.
Fig. 5 shows the exemplary screen shots 500 of Search Results.Search Results can include document (or document links) 502,504,506 and respective description and picture (if available).Tabulation 508 can be illustrated by the order of ranking Classification belonging to 1230 documents.It is discussed above the selection of the order to ranking.Can also be by such as brand 510 and valency Other classification such as lattice 512 grade are shown to user.Sorting item is selected those results that display has selected feature, and In certain embodiments, by display from other of this classification.Characteristic 514 is also shown and can be chosen so as to according to being somebody's turn to do Characteristic shows result, such as grades by price or user and lists.
The user that system described above and technology search for particularly shopping search to execution provides more rich search Experience.The search of high correlation saves user time and energy, and by attracting more traffics to be of value to search Engine supplier.In one exemplary embodiment, ongoing effort has been seen and has been used more than 10000 in training Sample queries, and hundreds and thousands of documents are rated and are used for refine machine learning classification instrument.
Although foregoing illustrating the detailed description of numerous different embodiment, but it is to be understood that the scope of law of this patent Limited by the words of this patent appending claims.This detailed description should be construed as exemplary only, and does not retouches State each possible embodiment of the present invention, even if because describing each possible embodiment is not impossible not sound feasible yet Border.Can use prior art or the technology of exploitation is to realize numerous alternative embodiment after this patent submits day to, this will Fall within the scope of definition claims of the present invention.
Thus, can many modifications may be made and change is without deviating from this in the described herein and technology that illustrates and structure Bright spirit and scope.It will thus be appreciated that method and apparatus described herein is merely illustrative, and it is not intended to this Bright scope.

Claims (11)

1. the method that on the computer used in Internet search, the result of relevance rank is pressed in display, described method Including:
Generate query set (302);
Internet search engine performs the result set (304) with exploitation correspondence of each inquiry in described query set;
The document (306) of limited quantity is selected from the result set of each correspondence;
The subjective grading (308) relative to subjective criterion is developed for each document in the document of described limited quantity;
The subjective grading and the external data that use each document in the document of described limited quantity at least in part are come machine Device learning classification instrument is programmed (310), and described external data includes the click of each in the document of limited quantity Counting, described counting includes the click when the inquiry producing each document is unrelated with the query set generated;
Perform the inquiry (352) collected that returns to document;
Described machine learning classification instrument is used to generate absolute relevance score at least partially for described document sets (356);
Creating document subset at least partially from described document sets, each document in described document subset has and exceedes threshold value Its corresponding absolute relevance score (358);
Feature based on the document in described document subset selects one or more relevant refinement (362);
Described computer shows the one or more relevant refinement (364) each document based on described document subset Absolute relevance score, on described computer, show described document subset by the order of the high correlation with described inquiry (366)。
2. the method for claim 1, it is characterised in that develop for each document in the document of described limited quantity Described subjective grading includes commenting from relative to each document the described subjective criterion document to described limited quantity Each in multiple judges of level receives described subjective grading.
3. method as claimed in claim 2, it is characterised in that also include:
Inter judge agreement rate (312) is calculated based on described subjective grading;And
The plurality of judge (314) is alerted when described inter judge agreement rate falls below one limits.
4. method as claimed in claim 2, it is characterised in that develop described subjective grading and include in the plurality of judge Each distribution poor, general, good and perfect grading in one.
5. method as claimed in claim 4, it is characterised in that each grading is allocated a numerical value, the value of each corresponding grading Increase exponentially.
6. method as claimed in claim 1, it is characterised in that select the literary composition of described limited quantity from the result set of each correspondence Shelves include that front 20 documents selecting to be specified by relative ranker and the choosing ranking that freely described relative ranker is specified is 21- Other 80 documents in the document of 250.
7. the method for claim 1, it is characterised in that select one or more relevant refinement to include selectivity characteristic and spy At least one in levying, wherein said characteristic includes that user grades, and described feature includes classification, price and brand.
8. the method for claim 1, it is characterised in that be programmed including many to described machine learning classification instrument Heavily add up regression tree (MART) instrument to be programmed.
9. the method for claim 1, it is characterised in that generate described query set and include from actual user searches inquiry Search engine logs selects described query set.
10. the method for claim 1, it is characterised in that also include:
Develop the feature of at least some of relevant lifting to described document sets;
At least one of each document being described document sets based on boosted feature regenerates absolute relevance and divides Number;And
Described document subset is re-created by the absolute relevance score through regenerating.
11. the method for claim 1, it is characterised in that also include based on relative rankings process from described document sets Select at least some of of described document sets.
CN201110117329.2A 2010-04-09 2011-04-11 Improve shopping search engine Expired - Fee Related CN102508831B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US12/757,095 2010-04-09
US12/757,095 US8700592B2 (en) 2010-04-09 2010-04-09 Shopping search engines

Publications (2)

Publication Number Publication Date
CN102508831A CN102508831A (en) 2012-06-20
CN102508831B true CN102508831B (en) 2016-12-14

Family

ID=

Similar Documents

Publication Publication Date Title
Wang et al. Copycats vs. original mobile apps: A machine learning copycat-detection method and empirical analysis
US8346701B2 (en) Answer ranking in community question-answering sites
CN108280155B (en) Short video-based problem retrieval feedback method, device and equipment
US8700592B2 (en) Shopping search engines
CN102822815B (en) For the method and system utilizing browser history to carry out action suggestion
Wang et al. Crowder: Crowdsourcing entity resolution
Blanco et al. Repeatable and reliable search system evaluation using crowdsourcing
Van Dijk et al. Early detection of topical expertise in community question answering
Duke et al. Weaving simple solutions to complex problems: an experimental study of skill in bipolar cobble-splitting
CN108694647B (en) Method and device for mining merchant recommendation reason and electronic equipment
CN107818105A (en) The recommendation method and server of application program
US8122015B2 (en) Multi-ranker for search
CN103020066B (en) A kind of method and apparatus identifying search need
US20090313286A1 (en) Generating training data from click logs
US20100185623A1 (en) Topical ranking in information retrieval
CN106777282B (en) The sort method and device of relevant search
CN103455538B (en) Information processing unit, information processing method and program
US20130091128A1 (en) Time-Aware Ranking Adapted to a Search Engine Application
CN101819583A (en) Generate domain corpus and dictionary at the robotization body
CN103544307B (en) A kind of multiple search engine automation contrast evaluating method independent of document library
CN108734159A (en) The detection method and system of sensitive information in a kind of image
CN115905489B (en) Method for providing bidding information search service
CN106339898A (en) Product innovation method based on internet big data
Sajeev et al. Effective web personalization system based on time and semantic relatedness
Arai et al. Predicting quality of answer in collaborative Q/A community

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20150717

Address after: Washington State

Applicant after: MICROSOFT TECHNOLOGY LICENSING, LLC

Address before: Washington State

Applicant before: Microsoft Corp.

GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20161214