CN105808541B - A kind of information matches treating method and apparatus - Google Patents

A kind of information matches treating method and apparatus Download PDF

Info

Publication number
CN105808541B
CN105808541B CN201410838112.4A CN201410838112A CN105808541B CN 105808541 B CN105808541 B CN 105808541B CN 201410838112 A CN201410838112 A CN 201410838112A CN 105808541 B CN105808541 B CN 105808541B
Authority
CN
China
Prior art keywords
product information
gear
feature
search keyword
described search
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410838112.4A
Other languages
Chinese (zh)
Other versions
CN105808541A (en
Inventor
王涛
黄鹏
林锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Singapore Holdings Pte Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201410838112.4A priority Critical patent/CN105808541B/en
Priority to PCT/CN2015/098247 priority patent/WO2016107455A1/en
Publication of CN105808541A publication Critical patent/CN105808541A/en
Application granted granted Critical
Publication of CN105808541B publication Critical patent/CN105808541B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to data processing field, especially a kind of information matches processing method, which comprises obtain each search key and product information, and each search key and product information are formed into search key and product information feature pair two-by-two;The correlation for calculating each described search keyword and product information feature pair determines the correlation gear of each described search keyword and product information feature pair according to correlativity calculation result;Calculate each described search keyword and product information feature pair estimates clicking rate, determining with each described search keyword and product information feature pair estimates that clicking rate is corresponding to estimate clicking rate gear using quantile;According to the correlation gear and the scoring estimated clicking rate gear and determine each described search keyword and product information feature pair, the matching degree to score for characterizing described search keyword and product information.

Description

A kind of information matches treating method and apparatus
Technical field
The present invention relates to technical field of data processing, more particularly to a kind of information matches treating method and apparatus.
Background technique
With the development of computer and Internet technology, e-commerce website is rapidly developed.In e-commerce The data or product of magnanimity are typically stored in website, in order to improve the efficiency that user searches for product of interest, website service The search term that device is often inputted according to user is recommended and the matched product of described search word to user.Recommended to the user In the matched product of search term, some products that are high with search term matching degree, high-quality and having carried out advertisement promotion are often By preferential recommendation to user.And seller often selects high-quality product to carry out advertisement promotion to improve the sales volume of the product. It when seller carries out advertisement promotion, needs to buy corresponding search key for the product information of publication, if the production of seller's publication Product information and the matching degree of search key are higher, and the probability that product is searched for by user is then bigger, and buyer user is also more likely to Find with the matched product of search term, so as to get useful information in information ocean.
Therefore, the matching degree of accurate judgement product information and search term not only can be improved seller user and promote product Validity can also reduce the data interaction that buyer user searches for product bring client and server repeatedly, improve user Experience, while promoting the performance of server.
It is of the existing technology judgement product information and search term matching degree method, often by calculate search term with The correlation of advertised product judges the matching degree of search term and release product information according to the relevance scores, recommends seller Buy the high search key of matching degree.
However, this method of the existing technology, only considers the correlation of search term and advertised product, and do not consider wide Product is accused by the degree of user preference, therefore the matching thus calculated is inaccurate.The matching calculated result of inaccuracy is not It only results in seller to fail effectively to promote its product, the product for yet causing website to be recommended to buyer user is not and its demand, emerging The product of interest exact matching, buyer, which has to retrieve repeatedly, can get its really interested product, to increase The data interaction of client and server, increases the data processing load of server, reduces the place of server where user Rationality energy, and seriously occupy valuable Internet bandwidth resource.
Summary of the invention
In order to solve the above technical problems, information can be improved the invention discloses a kind of information matches treating method and apparatus Matched objectivity and accuracy, improve user experience, reduce the data processing load of server, improve the place of server Rationality energy saves valuable Internet bandwidth resource.
Technical solution is as follows:
According to a first aspect of the embodiments of the present invention, a kind of product information matched processing method, the method packet are disclosed It includes:
Each search key and product information are obtained, and each search key and product information are formed into search two-by-two Keyword and product information feature pair;
The correlation for calculating each described search keyword and product information feature pair determines each according to correlativity calculation result The correlation gear of described search keyword and product information feature pair;
Calculate each described search keyword and product information feature pair estimates clicking rate, utilizes quantile determining and each institute That states search key and product information feature pair estimates that clicking rate is corresponding to estimate clicking rate gear;
Each described search keyword and product information are determined according to the correlation gear and the clicking rate gear of estimating The scoring of feature pair, the matching degree to score for characterizing described search keyword and product information.
According to a second aspect of the embodiments of the present invention, a kind of product information matching treatment device, described device packet are disclosed It includes:
Acquiring unit is believed for obtaining each search key and product information, and by each search key and product Breath forms search key and product information feature pair two-by-two;
Correlation gear determination unit, for calculating the correlation of each described search keyword and product information feature pair, The correlation gear of each described search keyword and product information feature pair is determined according to correlativity calculation result;
Clicking rate gear determination unit is estimated, for calculating each described search keyword and product information feature to estimating Clicking rate, it is determining with each described search keyword and product information feature pair estimates that clicking rate is corresponding to be estimated using quantile Clicking rate gear;
Matching determination unit, for determining each described search according to the correlation gear and the clicking rate gear of estimating The scoring of rope keyword and product information feature pair, the matching scored for characterizing described search keyword and product information Degree.
What the one aspect of the embodiment of the present invention can reach has the beneficial effect that method and apparatus provided by the invention, In When determining the matching degree of search key and product information, correlation of the search key with product information is not only allowed for, Degree of the product by user preference is also contemplated, product can be objectively responded by, which introducing, estimates click by the degree of user preference The rate factor carries out estimating clicking rate calculating, and determines the advertisement also according to preset ratio rules (for example, normal distribution law) Clicking rate gear corresponding to the probability that product is clicked under the search key by user, by correlation gear and clicking rate shelves The comprehensive matching degree for determining search key and product information in position, to obtain more accurate matching result.As a result, not The validity that seller user promotes product only can be improved, buyer user can also be reduced and search for product bring client repeatedly With the data interaction of server, user experience is improved, the data processing load of server is reduced, improves the treatability of server Can, save valuable Internet bandwidth resource.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The some embodiments recorded in invention, for those of ordinary skill in the art, without creative efforts, It is also possible to obtain other drawings based on these drawings.
Fig. 1 is a kind of information matches processing method flow diagram provided in an embodiment of the present invention;
Fig. 2 is that standardized normal distribution quartile provided in an embodiment of the present invention indicates to be intended to;
Fig. 3 estimates clicking rate gear distribution schematic diagram to be provided in an embodiment of the present invention;
Fig. 4 is information matches processing unit schematic diagram provided in an embodiment of the present invention.
Specific embodiment
Technical solution in order to enable those skilled in the art to better understand the present invention, below in conjunction with of the invention real The attached drawing in example is applied, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described implementation Example is only a part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, this field is common Technical staff's every other embodiment obtained without making creative work, all should belong to protection of the present invention Range.
The invention discloses a kind of information matches treating method and apparatus, not only allow for search key and product information Correlation, it is also contemplated that degree of the product by user preference introduces and is able to reflect product by the pre- of the degree of user preference Estimate the clicking rate factor to carry out estimating clicking rate calculating, and determines the advertised product in the search key according to normal distribution law Clicking rate gear corresponding to the lower probability clicked by user determines that search is crucial by correlation gear and clicking rate gear are comprehensive The matching degree of word and product information, to obtain more accurate matching result.
In a kind of application scenarios of the invention, in e-commerce website, seller needs to buy search key to push away Its wide advertised product, method provided in an embodiment of the present invention can be applied to web site server end, for judging search key The matching degree for the product information issued with seller, thus to the search key that seller recommends purchase matching degree high, to improve Seller user promotes the validity of product, further increases the probability that seller's consumer products are clicked by buyer user;On the other hand, It can also be improved the efficiency that buyer user searches for product, reduce buyer user and search for product bring client and server repeatedly Data interaction, improve user experience, reduce the data processing load of server, improve the process performance of server, save Valuable Internet bandwidth resource.
It is a kind of information matches processing method flow diagram provided in an embodiment of the present invention referring to Fig. 1.
S101, obtains each search key and product information, and by each search key and product information group two-by-two At search key and product information feature pair.
For seller, manage product be it is various, different classifications may be belonged to, at this moment, can be with needle The product information of seller is respectively processed, the words of its product information can be described by obtaining one or more, and with search Keyword forms search key and product information feature pair two-by-two.For example, the product information of seller include MP3 player, Iphone6, Note4, earphone etc..Search key is mobile phone, then the search key and product information feature formed is to just packet (mobile phone, MP3 player) is included, (mobile phone, iphone6), (mobile phone, Note4), (mobile phone, earphone).Certainly, the above is only examples Property explanation, be not intended as limitation of the present invention.Wherein, the product information is specifically as follows advertised product information.
It should be noted that before executing step S102 and step S103, it can be to each search key and production Product information is pre-processed, and the pretreatment includes the extraction processing of semantic feature needed for carrying out various features matching.Specifically The mode of processing can be multiplicity, herein without limiting.
In addition, there is no the successive of certainty to execute sequence between step S102 and step S103, the two can concurrently be held Row, can also reversedly execute.
S102 calculates the correlation of each described search keyword and product information feature pair, according to correlativity calculation result Determine the correlation gear of each described search keyword and product information feature pair.
Wherein, the calculating of correlation is mainly related to the classification correlation and text of advertised product by search key Property obtains.Wherein, classification correlation refers to the matching degree for clicking classification and advertised product place classification of search key;Text This correlation include various aspects, be primarily referred to as search key core word and advertised product title core word matching degree with And the attributes match degree in the attribute occurred in search key and advertised product description, comprehensive classification matching are with text matches Relevance scores can be obtained.
When specific implementation, step S102 be can specifically include: by described search keyword and product information feature to progress The matching judgment of various features;According to the matching judgment of the various features as a result, determining described search keyword and product letter Cease the correlation gear of feature pair.
When specific implementation, when carrying out correlation calculations, described search keyword and product information feature are every to carrying out The matching judgment of feature: both the judgement of classification characteristic matching and text feature matching judgment are at least one.
Further, the classification characteristic matching is judged as that judge whether described search keyword and product information belong to same Classification.In the present invention one in the specific implementation, classification characteristic matching judgement is often referred to the classification carried out according to text meaning Judgement.If described search keyword classification is identical with the classification of release product information, then the result of classification characteristic matching judgement is "Yes", otherwise, the result that classification characteristic matching judges are "No".Wherein, the result that classification characteristic matching judges is the one of "No" Kind special circumstances are that described search keyword does not have classification, and the search key for not classification is usually that its long-tail is tighter Weight, the long-tail are the search key seldom searched for by user.For example, described search keyword is " mp3 ", and release product For " audio player ", then the two belongs to same classification, and the result that classification characteristic matching judges is "Yes".Described search keyword For " mp3 ", and release product is " radio ", then both is not belonging to same classification, the result of classification characteristic matching judgement is "No".
Further, the text feature matching judgment is to judge in described search keyword and the text of release product information Whether hold is associated.Specifically, text feature matching judgment of the present invention include: exact matching judgement, part matching judgment, Centre word matching judgment, is hidden at least one in word matching judgment and reversed preposition matching judgment at centre word exact matching judgement Kind.Certainly, text feature matching judgment can also include extract Text eigenvector, using cosine angle formulae calculate text to The method of the similitude of amount.The invention does not limit this.
According to search key and product information feature to the matching judgment for carrying out various features after, it can according to institute The matching judgment of various features is stated as a result, determining the correlation gear of described search keyword and product information feature pair.At this In invention, correlation gear is divided into excellent poor third gear.
As shown in table 1, the one kind divided for correlation gear schematically illustrates, and can also be divided certainly using other gears Method, herein without limiting.
Table 1
S103, calculate each described search keyword and product information feature pair estimates clicking rate, and quantile is utilized to determine Estimate that clicking rate is corresponding to estimate clicking rate gear with each described search keyword and product information feature pair.
When specific implementation, step S103 may include: to estimate the corresponding ratio system of each gear of clicking rate gear Number;The numerical value of quantile is determined according to the proportionality coefficient;According to each described search keyword and product information feature pair The numerical value for estimating clicking rate and the quantile determine described in estimate gear section where clicking rate.
Preferably, the quantile is normal distribution quantile.
It is described in detail below with reference to an example.
Standardized normal distribution quantile is introduced first.Standardized normal distribution is also known as Gaussian Profile, be with 0 is equal Number, with 1 for standard deviation normal distribution, be denoted as N (0,1), be one and bell probability distribution curve is presented, both ends are small, in Between it is big, the gross area under curve is 1, is defined as: if stochastic variable X is obeyed, a location parameter is μ, scale parameter is σ's Probability distribution is denoted as:
X~N (μ, σ2) (1)
Its probability density function is
It is referred to as average that then f, which obeys 0, and 1 is the standardized normal distribution of standard deviation.
Normal distribution quantile for portraying the rule that the area under the curve under normal distribution meets, standardized normal distribution it is upper The definition of α quantile: setting X~N (0,1), and for appointing the α given, (0 < α < 1), title meets P (X > ZaThe point Z of)=αaJust for standard The upper α quantile of state distribution.Gaussian distribution table schematic diagram shown in Fig. 2 is such as looked into, Z is worked asa=1, find α=0.158655.
The common quantile of normal distribution has following rule:
68.268949% area is within the scope of a standard deviation of average or so under function curve.
95.449974% area is in the range of two 2 σ of standard deviation of average or so.
99.730020% area is in the range of three 3 σ of standard deviation of average or so.
99.993666% area is in the range of four 4 σ of standard deviation of average or so.
The present invention is exactly the gear division for applying normal distribution law and estimate clicking rate.
Wherein, estimating clicking rate eCTR is by establishing mathematical probabilities mould to historical multiple exposure and click behavior Type, and predicted by the model whether following exposure generates click, the value finally provided refers in some word Under, therefore the probability clicked after the exposure of some product by user is the value between one 0~1, the more big then explanation of value is by point It is bigger to hit possibility.
The LR model of eCTR estimated using industrywide standard, LR model include two parts of feature extraction and model training. Wherein, calculate each described search keyword and product information feature pair estimate clicking rate include: to described search keyword and Product information feature obtains the corresponding feature weight of each feature according to training pattern to feature extraction is carried out;Utilize extraction Clicking rate is estimated in feature and the corresponding feature weight calculating of the feature.
Wherein, the feature of feature extraction includes one of set forth below or any combination: the text of described search keyword This information, the category information of described search keyword, the title of the product information, the product information attribute, described search The correlation of rope keyword and the product information.
Then, after obtaining feature weight by model training, so that it may estimate advertisement estimating to (Query, offer) Clicking rate eCTR.Wherein, Query is search key, and offer is product information.
LR model belongs to generalized linear model, it is that linear model changes by Logistic formula and obtained, specific such as table Up to formula are as follows:
Wherein, wiIt is characterized weight, fiBe characterized value, y be finally calculate estimate clicking rate, formula limits final result It is set between (0,1), just matches with click probability.
Theoretically, Gauss normal distribution should be met by estimating accurate eCTR, using keyword and global dimension to advertisement Pair eCTR divide gear, the eCTR of each advertisement pair, Qi Dinghui are fallen on the correspondence section of whole eCTR distribution, which is Determine that the advertisement estimates clicking rate gear to corresponding.Clicking rate gear division methods are estimated according to provided by the invention, Can guarantee that the scoring of the advertised product of major part client be in average level, the advertised product of fraction client be in it is preferable or compared with The level of difference.
In embodiments of the present invention, according to practical business analysis and it is empirically determined, determination will estimate clicking rate gear division Preferably, in, it is 3 grades poor, the corresponding proportionality coefficient of each gear is respectively 3:4:3, i.e. the advertised product proportion of gear preferably Be 30%, gear be in advertised product proportion be 40%, gear be difference advertised product proportion be 30%, respectively Corresponding scoring is 5 stars, 4 stars and 3 stars.Specifically referring to figure 3., schematic diagram is divided to estimate clicking rate gear.Wherein, abscissa To estimate clicking rate value, ordinate is the frequency, and area under the curve corresponds to probability (i.e. ratio value).
When specific implementation, when the ratio cut partition according to 3:4:3 is global or keyword dimension estimates clicking rate eCTR distribution, It is required that deviateing distribution area under a certain range of curve of average is 0.4, two sides are then respectively 0.3, according to just due to symmetric relation The rule that state is distributed common quantile can obtain:
Wherein, μ is average, and σ is standard deviation, ZaFor normal distribution quantile.
That is, after the corresponding proportionality coefficient of each gear of clicking rate gear is estimated in determination, it can according to the ratio Example coefficient determines the numerical value of normal distribution quantile.
Assuming that Fig. 3 obeys standardized normal distribution, i.e. X~N (0,1), for appointing the α given, (0<α<1), title meets P (X>Z α) The point Z α of=α is the upper α quantile of standardized normal distribution, the corresponding lower α quantile of Z (1- α).
Z α is a numerical value, when X~N (0,1), then P (X > Z α)=α.Citing is illustrated, and is looked in gaussian distribution table α, correspondence find Z α.Such as look into the value of Z0.025, that is, it needs to look into the corresponding Z value of 1-0.025=0.975, searches normal state shown in Fig. 2 Distribution table, can just find 0.9750 corresponding Z value is 1.96, therefore Z0.025=1.96 looks into the corresponding α in α=1.96 Z in turn Value, needs first to look into 1.96, corresponding to 0.975,1-0.975=0.025=is α value.
Then as seen from Figure 3, a1 and a2 respectively corresponds two quantiles of standardized normal distribution, is got the bid by Fig. 3 Ratio value, can respectively correspond on Z α 1 and Z α 2, the value of Z α 1 and Z α 2 can be obtained by above method, in standard normal Under distribution, the corresponding upper α quantile of Z α 1, the corresponding lower α quantile of Z α 2.
When specific implementation, when estimating each gear of clicking rate gear according to the ratio cut partition of 3:4:3, it can be seen that deviate two sides Distribution area is 0.4 under a certain range of curve of average, and the left and right sides is then respectively 0.3, then in Fig. 3 mark due to symmetric relation The corresponding right side graph area of a2 quantile is 0.3 in quasi normal distribution quartile figure, that is, looks into Z0,3Value, that is, need to look into 1-0.3= 0.7 corresponding Z value.It is available to look into normal distribution quartile table shown in Fig. 2,0.7 corresponding Z value is 0.52, then Z0,3= 0.52, i.e. a2 are 0.52;Similarly, the value that can determine a1 is -0.52.A2 and a1 then respectively corresponds the ratio in normal distribution Under two quantiles.The value of normal distribution quantile Z α 1 and Z α 2 can certainly be calculated according to formula (4).Since Fig. 3 is full Therefore sufficient standardized normal distribution quantile has X~N (0,1), i.e. μ is equal to 0, σ and is equal to 1, is calculated by formula (4), Za= ± 0.5, corresponding diagram 3, i.e. a1=-0.5, a2=0.5.
The value for estimating clicking rate meets general normal distribution law.Corresponding to general normal distribution, (μ is not equal to 0, σ In the case where 1), corresponding quantile can then be obtained by the regular approximation of normal distribution quantile, general normal distribution The quantile of ratio 3:4:3 is corresponded to so as to obtain following formula:
Wherein, μ is average, and σ is standard deviation.Wherein, μ and σ can be calculated by real data sample.Specifically Ground can find out all average value mus for estimating clicking rate and corresponding variances sigma, specifically after acquisition is estimated and clicks rate score Calculation method is referred to method of the existing technology.Then, it according to average value mu and variances sigma, is obtained according to formula (4) The numerical value of general normal distribution quantile.
It, then can be according to estimating clicking rate and normal distribution quantile after the numerical value of the general normal distribution quantile of determination Numerical values recited, determine described in estimate gear section where clicking rate.For example, being found out according to standardized normal distribution quartile table pre- Estimate clicking rate belong to (0, μ-σ/2] when, it is corresponding estimate clicking rate gear be it is poor;Estimate clicking rate belong to (μ-σ/2, μ+σ/ 2) corresponding to estimate during clicking rate gear is when between;Estimate clicking rate belong to [μ+σ/2,1) when, it is corresponding to estimate a little The rate gear of hitting is preferably.
It should be noted that being illustrated so that proportionality coefficient is 3:4:3 as an example above, when determining proportionality coefficient is other When ratio, the thought for being referred to the above method is calculated.
S104 determines each described search keyword and product according to the correlation gear and the clicking rate gear of estimating The scoring of information characteristics pair, the matching degree to score for characterizing described search keyword and product information.
When specific implementation, the circular of scoring can be multiplicity, obtain for example, by using average weighted method Scoring or other implementations, the invention does not limit this.
It is a kind of implementation of Star rating referring to table 2.
Table 2
Wherein, it is analyzed according to practical business, can select the ratio for the use of difference in good being 3:4:3 is excellent wide to correlation It accuses to dividing, it is good advertisement to the ratio cut partition according to 1:1 for correlation that corresponding, which is 5 stars, 4 stars and 3 stars, Gear respectively corresponds 2 stars and 1 star, and the division of excellent advertisement pair is as shown in table 2, and good advertisement is relatively simple to dividing due to only two grades It is single, distribution average point is taken, good advertisement centering is 2 stars greater than mean value, and being less than mean value is 1 star.
In embodiments of the present invention, correlation calculations are combined and estimate clicking rate and calculate search key and advertised product Matching degree, not only how inform seller's user advertising quality and matching degree, can also objectively respond buyer user in website The probability that the advertised product is clicked by buyer when searching for product, scoring star is higher, and ranking is more forward, the possibility that buyer clicks Property it is bigger, bring exposure and feedback will be more so that the rate of return on investment of advertiser is also bigger, improve seller Promote the validity of product.For the buyer of website, advertiser can bring the promotion of product quality to the optimization of advertisement, Direct result is exactly that experience of the user in website can become more preferably, and the data interaction of client and server can become where user It is few, the data processing load of server is reduced, the process performance of server is improved, saves valuable Internet bandwidth resource.
It referring to fig. 4, is product information matching treatment schematic device provided in an embodiment of the present invention.
A kind of product information matching treatment device 400, described device include:
Acquiring unit 401, for obtaining each search key and product information, and by each search key and product Information forms search key and product information feature pair two-by-two.
Correlation gear determination unit 402 is related to product information feature pair for calculating each described search keyword Property, the correlation gear of each described search keyword and product information feature pair is determined according to correlativity calculation result.
Clicking rate gear determination unit 403 is estimated, for calculating each described search keyword and product information feature pair Estimate clicking rate, using quantile it is determining with each described search keyword and product information feature pair to estimate clicking rate corresponding Estimate clicking rate gear.
Matching determination unit 404, for determining each institute according to the correlation gear and the clicking rate gear of estimating State the scoring of search key and product information feature pair, the scoring is for characterizing described search keyword and product information Matching degree.
Further, the clicking rate gear determination unit of estimating includes estimating clicking rate computation subunit and gear determination Subelement, wherein the clicking rate computation subunit of estimating includes:
Model foundation subelement, for described search keyword and product information feature to carrying out feature extraction, according to Training pattern obtains the corresponding feature weight of each feature;
Computation subunit, for using extraction feature and the feature corresponding feature weight calculating estimate click Rate.
Further, the feature that the model foundation subelement extracts includes one of set forth below or any combination: Title, the product of the text information of described search keyword, the category information of described search keyword, the product information The correlation of the attribute of information, described search keyword and the product information.
Further, the clicking rate gear determination unit of estimating includes estimating clicking rate computation subunit and gear determination Subelement, wherein the gear determines that subelement includes:
Proportionality coefficient determines subelement, for estimating the corresponding proportionality coefficient of each gear of clicking rate gear;
Quantile determines subelement, for determining the numerical value of quantile according to the proportionality coefficient;
Gear section determines subelement, for according to each described search keyword and product information feature to estimating The gear section where clicking rate is estimated described in the numerical value of clicking rate and the quantile is determining.
Wherein, the quantile is normal distribution quantile.
Further, the correlation gear determination unit includes:
Characteristic matching subelement, for the matching by described search keyword and product information feature to various features are carried out Judgement;
Subelement is determined, for the matching judgment according to the various features as a result, determining described search keyword and production The correlation gear of product information characteristics pair.
Further, the matching judgment for the various features that the characteristic matching subelement carries out includes: classification characteristic matching Both judgement and text feature matching judgment are at least one;
The classification characteristic matching, which is judged as, judges whether described search keyword and product information belong to same classification;
The text feature matching judgment is to judge whether described search keyword is related to the content of text of product information Connection.
The function of above-mentioned each unit can correspond to the processing step of the above method of Fig. 1 detailed description, repeat no more in this. It should be noted that since embodiment of the method being explained in detail, this field relatively simple to the description of Installation practice Technical staff constructs the device of the invention embodiment it is understood that being referred to embodiment of the method.Those skilled in the art It is all belonged to the scope of protection of the present invention in other implementations for not making the creative labor lower acquisition.
It will be appreciated by persons skilled in the art that exemplary illustration has been carried out to method and Installation practice above, with On be not intended as limitation of the present invention, those skilled in the art are equal in other implementations for not making the creative labor lower acquisition It belongs to the scope of protection of the present invention.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including the element.The present invention can be by calculating The general described in the text, such as program module up and down for the computer executable instructions that machine executes.Generally, program module includes holding The routine of row particular task or realization particular abstract data type, programs, objects, component, data structure etc..It can also divide Cloth, which calculates, practices the present invention in environment, in these distributed computing environments, by connected long-range by communication network Processing equipment executes task.In a distributed computing environment, program module can be located at the local including storage equipment In remote computer storage medium.
All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for device reality For applying example, since it is substantially similar to the method embodiment, so describing fairly simple, related place is referring to embodiment of the method Part explanation.The apparatus embodiments described above are merely exemplary, wherein described be used as separate part description Unit may or may not be physically separated, component shown as a unit may or may not be Physical unit, it can it is in one place, or may be distributed over multiple network units.It can be according to the actual needs Some or all of the modules therein is selected to achieve the purpose of the solution of this embodiment.Those of ordinary skill in the art are not paying In the case where creative work, it can understand and implement.The above is only a specific embodiment of the invention, should be referred to Out, for those skilled in the art, without departing from the principle of the present invention, can also make several Improvements and modifications, these modifications and embellishments should also be considered as the scope of protection of the present invention.

Claims (13)

1. a kind of information matches processing method, which is characterized in that the described method includes:
Each search key and product information are obtained, and each search key and product information are formed to search key two-by-two Word and product information feature pair;
The correlation for calculating each described search keyword and product information feature pair determines each described according to correlativity calculation result The correlation gear of search key and product information feature pair;
Calculate each described search keyword and product information feature pair estimates clicking rate, described searches using quantile is determining with each Rope keyword and product information feature pair estimate that clicking rate is corresponding to estimate clicking rate gear;
Each described search keyword and product information feature are determined according to the correlation gear and the clicking rate gear of estimating Pair scoring, the matching degree to score for characterizing described search keyword and product information;
Wherein, calculate each described search keyword and product information feature pair correlation include: by described search keyword and Product information feature to carry out various features matching judgment, the matching judgment of various features include classification characteristic matching judgement and Both text feature matching judgments at least one, the text feature matching judgment includes exact matching judgement, partially matching is sentenced In disconnected, centre word matching judgment, centre word exact matching judgement, hiding word matching judgment and reversed preposition matching judgment at least One kind, the text feature matching judgment further include extracting Text eigenvector, calculate text vector using cosine angle formulae Similitude;
It is determining with each described search keyword and product information feature pair estimates that clicking rate is corresponding to be estimated a little using quantile Hitting rate gear includes estimating the corresponding proportionality coefficient of each gear of clicking rate gear;It is determined and is divided according to the proportionality coefficient The numerical value in site;Clicking rate and the quantile are estimated according to each described search keyword and product information feature pair Numerical value determine described in estimate gear section where clicking rate.
2. the method according to claim 1, wherein described calculate each described search keyword and product information spy Sign pair clicking rate of estimating include:
To described search keyword and product information feature to feature extraction is carried out, it is corresponding that each feature is obtained according to training pattern Feature weight;
Clicking rate is estimated using the feature of extraction and the corresponding feature weight calculating of the feature.
3. according to the method described in claim 2, it is characterized in that, the feature of the extraction include it is one of set forth below or Any combination: the mark of the text information of described search keyword, the category information of described search keyword, the product information Topic, the correlation of the attribute of the product information, described search keyword and the product information.
4. the method according to claim 1, wherein the quantile is normal distribution quantile.
5. the method according to claim 1, wherein determining that each described search is crucial according to correlativity calculation result Word and the correlation gear of product information feature pair include:
According to the matching judgment of the various features as a result, determining the correlation of described search keyword and product information feature pair Gear.
6. according to the method described in claim 5, it is characterized by:
The classification characteristic matching, which is judged as, judges whether described search keyword and product information belong to same classification;
The text feature matching judgment is to judge whether the content of text of described search keyword and product information is associated.
7. a kind of information matches processing unit, which is characterized in that described device includes:
Acquiring unit, for obtaining each search key and product information, and by each search key and product information two Two composition search keys and product information feature pair;
Correlation gear determination unit, for calculating the correlation of each described search keyword and product information feature pair, according to Correlativity calculation result determines the correlation gear of each described search keyword and product information feature pair;
Clicking rate gear determination unit is estimated, estimates click for calculate each described search keyword and product information feature pair Rate, it is determining with each described search keyword and product information feature pair estimates that clicking rate is corresponding to estimate click using quantile Rate gear;
Matching determination unit, for determining that each described search is closed according to the correlation gear and the clicking rate gear of estimating The scoring of keyword and product information feature pair, the matching journey to score for characterizing described search keyword and product information Degree;
Wherein, the correlation gear determination unit includes characteristic matching subelement, is used for described search keyword and product For information characteristics to the matching judgment for carrying out various features, the matching judgment of various features includes: the judgement of classification characteristic matching and text Both eigen matching judgments at least one, the text feature matching judgment include exact matching judgement, part matching judgment, Centre word matching judgment, is hidden at least one in word matching judgment and reversed preposition matching judgment at centre word exact matching judgement Kind, the text feature matching judgment further includes extracting Text eigenvector, calculates text vector using cosine angle formulae Similitude;
The clicking rate gear determination unit of estimating includes that gear determines that subelement, the gear determine that subelement includes: ratio Coefficient determines subelement, for estimating the corresponding proportionality coefficient of each gear of clicking rate gear;Quantile determines that son is single Member, for determining the numerical value of quantile according to the proportionality coefficient;Gear section determines subelement, for according to described each described Clicking rate is estimated described in the numerical value determination for estimating clicking rate and the quantile of search key and product information feature pair The gear section at place.
8. device according to claim 7, which is characterized in that the clicking rate gear determination unit of estimating includes estimating a little It hits rate computation subunit and gear determines subelement, wherein the clicking rate computation subunit of estimating includes:
Model foundation subelement is used for described search keyword and product information feature to feature extraction is carried out, according to training Model obtains the corresponding feature weight of each feature;
Computation subunit, for using extraction feature and the feature corresponding feature weight calculating estimate clicking rate.
9. device according to claim 8, which is characterized in that the feature that the model foundation subelement extracts includes following One of listed or any combination: the category information, described of the text information of described search keyword, described search keyword The correlation of the title of product information, the attribute of the product information, described search keyword and the product information.
10. device according to claim 7, which is characterized in that the clicking rate gear determination unit of estimating further includes pre- Estimate clicking rate computation subunit.
11. device according to claim 10, which is characterized in that the quantile is normal distribution quantile.
12. device according to claim 7, which is characterized in that the correlation gear determination unit further include:
Subelement is determined, for the matching judgment according to the various features as a result, determining that described search keyword and product are believed Cease the correlation gear of feature pair.
13. device according to claim 12, it is characterised in that:
The classification characteristic matching, which is judged as, judges whether described search keyword and product information belong to same classification;
The text feature matching judgment is to judge whether the content of text of described search keyword and product information is associated.
CN201410838112.4A 2014-12-29 2014-12-29 A kind of information matches treating method and apparatus Active CN105808541B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201410838112.4A CN105808541B (en) 2014-12-29 2014-12-29 A kind of information matches treating method and apparatus
PCT/CN2015/098247 WO2016107455A1 (en) 2014-12-29 2015-12-22 Information matching processing method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410838112.4A CN105808541B (en) 2014-12-29 2014-12-29 A kind of information matches treating method and apparatus

Publications (2)

Publication Number Publication Date
CN105808541A CN105808541A (en) 2016-07-27
CN105808541B true CN105808541B (en) 2019-11-08

Family

ID=56284233

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410838112.4A Active CN105808541B (en) 2014-12-29 2014-12-29 A kind of information matches treating method and apparatus

Country Status (2)

Country Link
CN (1) CN105808541B (en)
WO (1) WO2016107455A1 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649605B (en) * 2016-11-28 2020-09-29 百度在线网络技术(北京)有限公司 Method and device for triggering promotion keywords
CN107767172A (en) * 2017-10-12 2018-03-06 百度在线网络技术(北京)有限公司 Information-pushing method, device, server and medium
CN110516033A (en) * 2018-05-04 2019-11-29 北京京东尚科信息技术有限公司 A kind of method and apparatus calculating user preference
CN110633398A (en) * 2018-05-31 2019-12-31 阿里巴巴集团控股有限公司 Method for confirming central word, searching method, device and storage medium
CN111047009B (en) * 2019-11-21 2023-05-23 腾讯科技(深圳)有限公司 Event trigger probability prediction model training method and event trigger probability prediction method
CN110909182B (en) * 2019-11-29 2023-05-09 北京达佳互联信息技术有限公司 Multimedia resource searching method, device, computer equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103514178A (en) * 2012-06-18 2014-01-15 阿里巴巴集团控股有限公司 Searching and sorting method and device based on click rate
CN103678481A (en) * 2003-09-30 2014-03-26 雅虎公司 Method and apparatus for search scoring
CN104077306A (en) * 2013-03-28 2014-10-01 阿里巴巴集团控股有限公司 Search engine result sequencing method and search engine result sequencing system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6728706B2 (en) * 2001-03-23 2004-04-27 International Business Machines Corporation Searching products catalogs
CN103729365A (en) * 2012-10-12 2014-04-16 阿里巴巴集团控股有限公司 Searching method and system
CN103778548B (en) * 2012-10-19 2018-05-29 阿里巴巴集团控股有限公司 Merchandise news and key word matching method, merchandise news put-on method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678481A (en) * 2003-09-30 2014-03-26 雅虎公司 Method and apparatus for search scoring
CN103514178A (en) * 2012-06-18 2014-01-15 阿里巴巴集团控股有限公司 Searching and sorting method and device based on click rate
CN104077306A (en) * 2013-03-28 2014-10-01 阿里巴巴集团控股有限公司 Search engine result sequencing method and search engine result sequencing system

Also Published As

Publication number Publication date
CN105808541A (en) 2016-07-27
WO2016107455A1 (en) 2016-07-07

Similar Documents

Publication Publication Date Title
CN105808541B (en) A kind of information matches treating method and apparatus
US10270791B1 (en) Search entity transition matrix and applications of the transition matrix
CN106709040B (en) Application search method and server
TWI609278B (en) Method and system for recommending search words
CN103631929B (en) A kind of method of intelligent prompt, module and system for search
CN103593425B (en) Preference-based intelligent retrieval method and system
CN106339502A (en) Modeling recommendation method based on user behavior data fragmentation cluster
CN110020128B (en) Search result ordering method and device
CN105574216A (en) Personalized recommendation method and system based on probability model and user behavior analysis
US20140012840A1 (en) Generating search results
CN102663022B (en) Classification recognition method based on URL (uniform resource locator)
CN105468649B (en) Method and device for judging matching of objects to be displayed
WO2013163062A1 (en) Recommending keywords
Zhong et al. Time-aware service recommendation for mashup creation in an evolving service ecosystem
CN104994424B (en) A kind of method and apparatus for building audio and video standard data set
CN108108380A (en) Search ordering method, searching order device, searching method and searcher
CN103606097A (en) Method and system based on credibility evaluation for product information recommendation
CN103593353A (en) Information search method and display information sorting weight value determination method and device
CN104462327B (en) Calculating, search processing method and the device of statement similarity
US10019513B1 (en) Weighted answer terms for scoring answer passages
CN102289514B (en) The method of Social Label automatic marking and Social Label automatic marking device
CN101820592A (en) Method and device for mobile search
CN104699817B (en) A kind of method for sequencing search engines and system based on improvement spectral clustering
CN103049470A (en) Opinion retrieval method based on emotional relevancy
CN103778122A (en) Searching method and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20240226

Address after: # 01-21, Lai Zan Da Building 1, 51 Belarusian Road, Singapore

Patentee after: Alibaba Singapore Holdings Ltd.

Country or region after: Singapore

Address before: Cayman Islands Grand Cayman capital building, a four storey No. 847 mailbox

Patentee before: ALIBABA GROUP HOLDING Ltd.

Country or region before: Cayman Islands