CN105808541A - Information matching processing method and apparatus - Google Patents
Information matching processing method and apparatus Download PDFInfo
- Publication number
- CN105808541A CN105808541A CN201410838112.4A CN201410838112A CN105808541A CN 105808541 A CN105808541 A CN 105808541A CN 201410838112 A CN201410838112 A CN 201410838112A CN 105808541 A CN105808541 A CN 105808541A
- Authority
- CN
- China
- Prior art keywords
- product information
- key word
- search key
- gear
- clicking rate
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 8
- 238000000034 method Methods 0.000 claims abstract description 43
- 238000004364 calculation method Methods 0.000 claims abstract description 15
- 230000008569 process Effects 0.000 claims description 13
- 238000000605 extraction Methods 0.000 claims description 10
- 238000012549 training Methods 0.000 claims description 6
- 239000000284 extract Substances 0.000 claims description 2
- 238000012545 processing Methods 0.000 abstract description 8
- 238000010586 diagram Methods 0.000 description 8
- 230000003993 interaction Effects 0.000 description 5
- 238000005192 partition Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to the field of data processing, and particularly relates to an information matching processing method. The method comprises the steps of: acquiring each search keyword and product information, and forming search keyword and product information characteristic pairs from all the search keywords and all the product information in pairs; calculating a correlation of each search keyword and product information characteristic pair, and according to a correlation calculation result, determining a correlation gear of each search keyword and product information characteristic pair; calculating a predicted click rate of each search keyword and product information characteristic pair, and by utilizing a quantile, determining a predicted click rate gear corresponding to the predicted click rate of each search keyword and product information characteristic pair; and according to the correlation gear and the predicted click rate gear, determining a score of each search keyword and product information characteristic pair, wherein the scores are used for representing matching degrees of the search keywords and the corresponding product information.
Description
Technical field
The present invention relates to technical field of data processing, particularly relate to a kind of information matches processing method and device.
Background technology
Along with the development of computer and Internet technology, e-commerce website obtains rapid development.E-commerce website is typically stored with data or the product of magnanimity, in order to improve the efficiency of user search product of interest, the search word that Website server often inputs according to user, the product mated with described search word is recommended to user.In the product mated with search word recommended to user, some are high with search word matching degree, quality is good and carried out the product of advertisement promotion often by preferential recommendation to user.And seller often selects the measured product of matter to carry out advertisement promotion to improve the sales volume of the product.When seller carries out advertisement promotion, need the product information for issuing to buy and search for key word accordingly, if the product information that seller issues is more high with the matching degree of search key word, product is then more big by the probability of user search, and buyer user is also more likely to find the product mated with search word such that it is able to get useful information in information ocean.
Therefore, accurately judge the matching degree of product information and search word, be possible not only to improve seller user and promote the effectiveness of product, it is also possible to client that minimizing buyer user's repeated searching product brings and the data interaction of server, improve Consumer's Experience, promote the performance of server simultaneously.
The judgement product information of prior art existence and the matching degree method of search word, often by the dependency calculating search word and advertised product, search word and the matching degree of release product information is judged, it is recommended that seller buys the search key word that matching degree is high according to described relevance scores.
But, this method that prior art exists, only consider the dependency of search word and advertised product, and do not consider that advertised product is by the degree of user preference, the matching therefore thus calculated is inaccurate.Inaccurate matching result of calculation not only results in seller and fails effectively to promote its product, also leading to website is not the product mated completely with its demand, interest to the product that buyer user recommends, buyer has to repeatedly retrieve to get its product really interested, thus adding the data interaction of user place client and server, increase the data processing load of server, reduce the process performance of server, and seriously occupy the Internet bandwidth resource of preciousness.
Summary of the invention
For solving above-mentioned technical problem, the invention discloses a kind of information matches processing method and device, objectivity and the accuracy of information matches can be improved, improve Consumer's Experience, reduce the data processing load of server, improve the process performance of server, save valuable Internet bandwidth resource.
Technical scheme is as follows:
First aspect according to embodiments of the present invention, discloses a kind of product information matched processing method, and described method includes:
Obtain each search key word and product information, and described each search key word and product information are formed search key word and product information feature pair between two;
Calculate the dependency of each described search key word and product information feature pair, determine the dependency gear of each described search key word and product information feature pair according to correlativity calculation result;
What calculate each described search key word and product information feature pair estimates clicking rate, utilize quantile determine with each described search key word and product information feature pair estimate clicking rate corresponding estimate clicking rate gear;
Determining the scoring of each described search key word and product information feature pair according to described dependency gear and described clicking rate gear of estimating, described scoring is for characterizing the matching degree of described search key word and product information.
Second aspect according to embodiments of the present invention, discloses a kind of product information matching treatment device, and described device includes:
Acquiring unit, is used for obtaining each search key word and product information, and described each search key word and product information is formed search key word and product information feature pair between two;
Dependency gear determines unit, for calculating the dependency of each described search key word and product information feature pair, determines the dependency gear of each described search key word and product information feature pair according to correlativity calculation result;
Estimate clicking rate gear and determine unit, estimate clicking rate for what calculate each described search key word and product information feature pair, utilize quantile determine with each described search key word and product information feature pair estimate clicking rate corresponding estimate clicking rate gear;
Matching determines unit, and for determining the scoring of each described search key word and product information feature pair according to described dependency gear and described clicking rate gear of estimating, described scoring is for characterizing the matching degree of described search key word and product information.
What one aspect of the embodiment of the present invention can reach has the beneficial effect that method and apparatus provided by the invention, when determining the matching degree of search key word and product information, not only allow for the dependency of search key word and product information, also contemplate product by the degree of user preference, introduce and can objectively respond product and carried out estimating clicking rate calculate by the clicking rate factor of estimating of the degree of user preference, and always according to default ratio rules (such as, normal distribution law) determine the clicking rate gear corresponding to the probability that this advertised product is clicked by user under this search key word, the matching degree of search key word and product information is comprehensively determined by dependency gear and clicking rate gear, thus obtaining matching result more accurately.Thus, it is possible not only to improve seller user and promotes the effectiveness of product, the data interaction of client that buyer user's repeated searching product brings and server can also be reduced, improve Consumer's Experience, reduce the data processing load of server, improve the process performance of server, save valuable Internet bandwidth resource.
Accompanying drawing explanation
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, the accompanying drawing used required in embodiment or description of the prior art will be briefly described below, apparently, the accompanying drawing that the following describes is only some embodiments recorded in the present invention, for those of ordinary skill in the art, under the premise not paying creative work, it is also possible to obtain other accompanying drawing according to these accompanying drawings.
A kind of information matches process flow schematic diagram that Fig. 1 provides for the embodiment of the present invention;
The standard normal distribution point position that Fig. 2 provides for the embodiment of the present invention represents intention;
Fig. 3 estimates clicking rate gear distribution schematic diagram for what the embodiment of the present invention provided;
The information matches that Fig. 4 provides for the embodiment of the present invention processes device schematic diagram.
Detailed description of the invention
In order to make those skilled in the art be more fully understood that the technical scheme in the present invention, below in conjunction with the accompanying drawing in the embodiment of the present invention, technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is only a part of embodiment of the present invention, rather than whole embodiments.Based on the embodiment in the present invention, the every other embodiment that those of ordinary skill in the art obtain under not making creative work premise, all should belong to the scope of protection of the invention.
The invention discloses a kind of information matches processing method and device, not only allow for the dependency of search key word and product information, also contemplate product by the degree of user preference, introduce and can reflect that product is undertaken estimating clicking rate calculating by the clicking rate factor of estimating of the degree of user preference, and determine the clicking rate gear corresponding to the probability that this advertised product is clicked by user under this search key word according to normal distribution law, the matching degree of search key word and product information is comprehensively determined by dependency gear and clicking rate gear, thus obtaining matching result more accurately.
In a kind of application scenarios of the present invention, in ecommerce class website, seller needs to buy search key word to promote its advertised product, the method that the embodiment of the present invention provides can apply to web site server end, for judging the matching degree of search key word and the product information of seller's issue, thus recommending to buy the high search key word of matching degree to seller, promoting the effectiveness of product improving seller user, improving the probability that seller's consumer products is clicked by buyer user further;On the other hand, the efficiency of buyer's user search product can also be improved, reduce the data interaction of the client brought of buyer user's repeated searching product and server, improve Consumer's Experience, reduce the data processing load of server, improve the process performance of server, save valuable Internet bandwidth resource.
Referring to Fig. 1, for a kind of information matches process flow schematic diagram that the embodiment of the present invention provides.
S101, obtains each search key word and product information, and described each search key word and product information is formed search key word and product information feature pair between two.
Generally for seller, the product of its operation is various, it is likely to belong to different classifications, at this moment, can be respectively processed for the product information of seller, obtain one or more word that can describe its product information, and form search key word and product information feature pair between two with search key word.Such as, the product information of seller includes MP3 player, iphone6, Note4, earphone etc..Search key word is mobile phone, then the search key word formed and product information feature are to just including (mobile phone, MP3 player), (mobile phone, iphone6), (mobile phone, Note4), (mobile phone, earphone).Certainly, these are only exemplary illustration, be not intended as limitation of the present invention.Wherein, described product information is specifically as follows advertised product information.
It should be noted that before performing step S102 and step S103, it is possible to described each search key word and product information are carried out pretreatment, and described pretreatment includes carrying out the extraction of the required semantic feature of various features coupling and processes.The concrete mode processed can be various, is not defined at this.
Additionally, priority execution sequence uninevitable between step S102 and step S103, the two can be performed in parallel, it is also possible to performs reversedly.
S102, calculates the dependency of each described search key word and product information feature pair, determines the dependency gear of each described search key word and product information feature pair according to correlativity calculation result.
Wherein, calculating of dependency obtains mainly through classification dependency and the text relevant of search key word with advertised product.Wherein, classification dependency refers to the matching degree clicking classification and advertised product place classification of search key word;Text relevant includes many-side, being primarily referred to as the attributes match degree during the core word of search key word describes with advertised product with the attribute of appearance in the core word matching degree of advertised product title and search key word, comprehensive classification coupling can obtain relevance scores with text matches.
When implementing, step S102 specifically may include that the matching judgment to carrying out various features by described search key word and product information feature;Matching judgment result according to described various features, it is determined that the dependency gear of described search key word and product information feature pair.
When implementing, when carrying out correlation calculations, described search key word and the product information feature matching judgment to carrying out various features: classification characteristic matching judges and text feature matching judgment is at least one.
Further, described classification characteristic matching is judged as judging whether described search key word and product information belong to same classification.In the present invention one implements, described classification characteristic matching judges that being often referred to the classification carried out according to text implication judges.As identical with the classification of release product information in described search key word classification, then the result that classification characteristic matching judges is "Yes", and otherwise, the result that classification characteristic matching judges is "No".Wherein, a kind of special circumstances that result is "No" that classification characteristic matching judges are that described search key word does not have classification, and more serious for not having the search key word of classification to be usually its long-tail ratio, described long-tail is namely seldom by the search key word of user search.Such as, described search key word is " mp3 ", and release product is " audio player ", then both belong to same classification, and the result that classification characteristic matching judges is "Yes".Described search key word is " mp3 ", and release product is " radio ", then both are not belonging to same classification, and the result that classification characteristic matching judges is "No".
Further, described text feature matching judgment is whether the content of text judging described search key word and release product information is associated.Specifically, text feature matching judgment of the present invention includes: completely at least one in matching judgment, part matching judgment, centre word matching judgment, the complete matching judgment of centre word, hiding word matching judgment and reverse preposition matching judgment.Certainly, text feature matching judgment can also include extracting Text eigenvector, the method utilizing the similarity of cosine angle formulae calculating text vector.This is not defined by the present invention.
According to search key word and product information feature to after carrying out the matching judgment of various features, namely can according to the matching judgment result of described various features, it is determined that the dependency gear of described search key word and product information feature pair.In the present invention, dependency gear is divided into excellent poor third gear.
As shown in table 1, the one divided for dependency gear schematically illustrates, and certainly can also adopt other gear division methods, not be defined at this.
Table 1
S103, what calculate each described search key word and product information feature pair estimates clicking rate, utilize quantile determine with each described search key word and product information feature pair estimate clicking rate corresponding estimate clicking rate gear.
When implementing, step S103 may include that and estimates the proportionality coefficient that each gear of clicking rate gear is corresponding;The numerical value of quantile is determined according to described proportionality coefficient;The numerical value estimating clicking rate and described quantile according to described each described search key word and product information feature pair determine described in estimate the gear at clicking rate place interval.
Preferably, described quantile is normal distribution quantile.
It is described in detail below in conjunction with an example.
First standard normal distribution quantile is introduced.Standard normal distribution is also called Gauss distribution, it it is the normal distribution 0 to be mean, to be standard deviation with 1, it is designated as N (0,1), it is one and presents bell probability distribution curve, and two is little, broad in the middle, the gross area under curve is 1, and it is defined as: if it be μ, scale parameter is the probability distribution of σ that stochastic variable X obeys location parameter, be designated as:
X~N (μ, σ2)(1)
Its probability density function is
Then claiming f to obey 0 is average, and 1 is the standard normal distribution of standard deviation.
Normal distribution quantile is for portraying the rule that the area under the curve under normal distribution meets, the upper α quantile definition of standard normal distribution: set X~N (0,1), for appointing the α given, (0 < α < 1), claims to meet P (X > ZaThe point Z of)=αaUpper α quantile for standard normal distribution.As looked into the gaussian distribution table schematic diagram shown in Fig. 2, work as Za=1, find α=0.158655.
The quantile that normal distribution is conventional has following rule:
Under function curve within the scope of the area of 68.268949% standard deviation about average.
The area of 95.449974% is about average in the scope of two standard deviation 2 σ.
The area of 99.730020% is about average in the scope of three standard deviation 3 σ.
The area of 99.993666% is about average in the scope of four standard deviation 4 σ.
The present invention applies normal distribution law just and has carried out estimating the gear division of clicking rate.
Wherein, estimating clicking rate eCTR is by historical multiexposure, multiple exposure and click behavior are set up mathematical probabilities model, and by this model, whether following exposure is produced click and be predicted, the value finally provided refers under certain word, the probability clicked by user after the exposure of certain product, therefore, it is the value between 0~1, and value is more big, illustrates that clicked probability is more big.
ECTR estimates the LR model adopting industrywide standard, and LR model includes feature extraction and two parts of model training.Wherein, the clicking rate of estimating calculating each described search key word and product information feature pair includes: to described search key word and product information feature to carrying out feature extraction, obtain each feature characteristic of correspondence weight according to training pattern;The feature extracted and described feature characteristic of correspondence weight calculation is utilized to estimate clicking rate.
Wherein, the feature of feature extraction include set forth below in one or arbitrarily combine: the dependency of the text message of described search key word, the category information of described search key word, the title of described product information, the attribute of described product information, described search key word and described product information.
Then, after obtaining feature weight by model training, it is possible to estimate advertisement and (Query, offer) is estimated clicking rate eCTR.Wherein, Query is search key word, and offer is product information.
LR model belongs to generalized linear model, and it is that linear model obtains through the change of Logistic formula, specifically as expression formula is:
Wherein, wiFor feature weight, fiFor eigenvalue, y be final calculate estimate clicking rate, final result is defined between (0,1) by formula, just with click probability and match.
In theory, estimate eCTR accurately and should meet Gauss normal distribution, the eCTR of advertisement pair is divided gear by the dimension using key word and the overall situation, the eCTR of each advertisement pair, it can drop on the corresponding interval of overall eCTR distribution surely, and namely this interval determines this advertisement and corresponding is estimated clicking rate gear.Clicking rate gear division methods is estimated, it is ensured that the scoring of the advertised product of major part client is in average level, and the advertised product of fraction client is in better or poor level according to provided by the invention.
In embodiments of the present invention, according to practical business analysis and empirically determined, determine will estimate clicking rate gear divide as well, in, differ from 3 grades, the proportionality coefficient respectively 3:4:3 that each gear is corresponding, namely gear advertised product proportion as well is 30%, gear be in advertised product proportion be 40%, gear be difference advertised product proportion be 30%, scoring corresponding respectively is 5 stars, 4 stars and 3 stars.Specifically refer to Fig. 3, divide schematic diagram for estimating clicking rate gear.Wherein, abscissa is for estimating clicking rate value, and vertical coordinate is the frequency, area under the curve correspondence probability (i.e. ratio value).
When implementing, when according to when the ratio cut partition of 3:4:3 is overall or key word dimension estimates clicking rate eCTR distribution, it is desirable under the deviation a range of curve of average, distribution area is 0.4, and both sides are due to symmetrical relations, it is respectively then 0.3, can obtain according to the rule of the conventional quantile of normal distribution:
Wherein, μ is average, and σ is standard deviation, ZaFor normal distribution quantile.
It is to say, after determining and estimating the proportionality coefficient that each gear of clicking rate gear is corresponding, the numerical value of normal distribution quantile namely can be determined according to described proportionality coefficient.
Assume that Fig. 3 obeys standard normal distribution, i.e. X~N (0,1), for appointing the α given, (0<α<1), claims the upper α quantile that some Z α is standard normal distribution meeting P (X>Z α)=α, the corresponding lower α quantile of Z (1-α).
Z α is a numerical value, when X~N (0,1), then P (X > Z α)=α.Citing illustrates, and looks for α, correspondence to find Z α in gaussian distribution table.Such as look into the value of Z0.025, namely need to look into Z value corresponding for 1-0.025=0.975, search gaussian distribution table shown in Fig. 2, the Z value that can just find 0.9750 correspondence is 1.96, therefore Z0.025=1.96 looks into the α value of Z α=1.96 correspondence in turn, need first to look into 1.96, correspond to 0.975,1-0.975=0.025=and be α value.
Then come as seen from Figure 3, two quantiles of a1 and a2 corresponding standard normal distribution respectively, by target ratio value in Fig. 3, can correspond on Z α 1 and Z α 2 respectively, Z α 1 and the value of Z α 2 can be obtained by above method, under standard normal distribution, the corresponding upper α quantile of Z α 1, the corresponding lower α quantile of Z α 2.
When implementing, when estimating each gear of clicking rate gear according to the ratio cut partition of 3:4:3, it can be seen that distribution area is 0.4 under the both sides deviation a range of curve of average, the left and right sides is due to symmetrical relations, it is respectively then 0.3, then divide the right side graph area that a2 quantile in bitmap is corresponding to be 0.3 at Fig. 3 standard normal distribution, namely look into Z0,3Value, namely need to look into Z value corresponding for 1-0.3=0.7.Looking into the normal distribution shown in Fig. 2 divides a table to obtain, and the Z value of 0.7 correspondence is 0.52, then Z0,3=0.52, namely a2 is 0.52;It is likewise possible to determine that the value of a1 is-0.52.A2 and a1 is then respectively to should the ratio two quantiles under normal distribution.The value of normal distribution quantile Z α 1 and Z α 2 can certainly be calculated according to formula (4).Owing to Fig. 3 meets standard normal distribution quantile, therefore, having X~N (0,1), namely μ is equal to 0, σ equal to 1, formula (4) calculate and obtain, Za=± 0.5, corresponding diagram 3, i.e. a1=-0.5, a2=0.5.
The value estimating clicking rate meets general normal distribution law.(μ is not equal to 0 to correspond to general normal distribution, σ is not equal to 1) when, corresponding quantile then can obtain by the rule of normal distribution quantile is approximate, and the quantile of general normal distribution correspondence ratio 3:4:3 is such that it is able to obtain below equation:
Wherein, μ is average, and σ is standard deviation.Wherein, μ and σ can be calculated by real data sample and obtain.Specifically, after clicking rate numerical value is estimated in acquisition, can obtaining the variances sigma of all average value mu estimating clicking rate and correspondence, circular is referred to the method that prior art exists.Then, according to average value mu and variances sigma, the numerical value of general normal distribution quantile is obtained according to formula (4).
After the numerical value determining general normal distribution quantile, then can according to the numerical values recited estimating clicking rate and normal distribution quantile, it is determined that described in estimate the gear at clicking rate place interval.Such as, according to standard normal distribution divide a table obtain estimate clicking rate belong to (0, μ-σ/2] time, its correspondence estimate clicking rate gear for poor;Estimate clicking rate when belonging between (μ-σ/2, μ+σ/2), its correspondence estimate during clicking rate gear is;Estimate clicking rate belong to [μ+σ/2,1) time, its correspondence estimate clicking rate gear as well.
Illustrate for 3:4:3 it should be noted that above for proportionality coefficient, when the proportionality coefficient determined is for other ratios, it is possible to the thought with reference to said method is calculated.
S104, determines the scoring of each described search key word and product information feature pair according to described dependency gear and described clicking rate gear of estimating, and described scoring is for characterizing the matching degree of described search key word and product information.
When implementing, the circular of scoring can be various, for instance adopting average weighted method to be marked or other implementations, this is not defined by the present invention.
With reference to table 2, for a kind of implementation of Star rating.
Table 2
Wherein, according to practical business analysis, it is possible to selected to make to make good use of the ratio that middle difference is 3:4:3 be that excellent advertisement is to dividing to dependency, corresponding is 5 stars respectively, 4 stars and 3 stars, is that good advertisement is to the ratio cut partition gear according to 1:1 for dependency, corresponding 2 stars and 1 star respectively, the division of excellent advertisement pair is as shown in table 2, and good advertisement, to due to only two grades, divides relatively easy, take distribution average point, good advertisement centering, is 2 stars more than average, is 1 star less than average.
In embodiments of the present invention, the matching degree combining correlation calculations and estimate clicking rate calculating search key word and advertised product, not only inform how are seller's user advertising quality and matching degree, also can objectively respond buyer user's probability that this advertised product is clicked by buyer when site search product, scoring star is more high, ranking is more forward, the probability that buyer clicks is more big, exposure and the feedback brought will be more, the rate of return on investment making advertiser is also more big, improves seller and promotes the effectiveness of product.For website buyer, the optimization of advertisement can be brought the lifting of product quality by advertiser, its direct result is exactly that user's experience in website can become better, the data interaction of user place client and server can tail off, reduce the data processing load of server, improve the process performance of server, save valuable Internet bandwidth resource.
Referring to Fig. 4, for the product information matching treatment device schematic diagram that the embodiment of the present invention provides.
A kind of product information matching treatment device 400, described device includes:
Acquiring unit 401, is used for obtaining each search key word and product information, and described each search key word and product information is formed search key word and product information feature pair between two.
Dependency gear determines unit 402, for calculating the dependency of each described search key word and product information feature pair, determines the dependency gear of each described search key word and product information feature pair according to correlativity calculation result.
Estimate clicking rate gear and determine unit 403, estimate clicking rate for what calculate each described search key word and product information feature pair, utilize quantile determine with each described search key word and product information feature pair estimate clicking rate corresponding estimate clicking rate gear.
Matching determines unit 404, and for determining the scoring of each described search key word and product information feature pair according to described dependency gear and described clicking rate gear of estimating, described scoring is for characterizing the matching degree of described search key word and product information.
Further, described in estimate clicking rate gear and determine that unit includes estimating clicking rate computation subunit and gear determines subelement, wherein, described in estimate clicking rate computation subunit and include:
Subelement set up by model, for described search key word and product information feature to carrying out feature extraction, obtain each feature characteristic of correspondence weight according to training pattern;
Computation subunit, for utilizing the feature of extraction and described feature characteristic of correspondence weight calculation to estimate clicking rate.
Further, described model set up the feature that subelement extracts include set forth below in one or arbitrarily combine: the dependency of the text message of described search key word, the category information of described search key word, the title of described product information, the attribute of described product information, described search key word and described product information.
Further, described in estimate clicking rate gear and determine that unit includes estimating clicking rate computation subunit and gear determines subelement, wherein, described gear determines that subelement includes:
Proportionality coefficient determines subelement, for estimating the proportionality coefficient that each gear of clicking rate gear is corresponding;
Quantile determines subelement, for determining the numerical value of quantile according to described proportionality coefficient;
Subelement is determined in gear interval, and the gear for estimating clicking rate place according to the numerical value estimating clicking rate and described quantile of described each described search key word and product information feature pair described in determining is interval.
Wherein, described quantile is normal distribution quantile.
Further, described dependency gear determines that unit includes:
Characteristic matching subelement, for the matching judgment to carrying out various features by described search key word and product information feature;
Determine subelement, for the matching judgment result according to described various features, it is determined that the dependency gear of described search key word and product information feature pair.
Further, the matching judgment of the various features that described characteristic matching subelement carries out includes: classification characteristic matching judges and text feature matching judgment is at least one;
Described classification characteristic matching is judged as judging whether described search key word and product information belong to same classification;
Described text feature matching judgment is whether the content of text judging described search key word and product information is associated.
The function of above-mentioned each unit may correspond to the process step of the said method of Fig. 1 detailed description, repeats no more in this.It should be noted that owing to embodiment of the method is explained in detail, the description of device embodiment is relatively simple, it will be appreciated by persons skilled in the art that and be referred to embodiment of the method structure assembly of the invention embodiment.Other implementations that those skilled in the art obtain under not paying creative work belong to protection scope of the present invention.
It will be understood by those skilled in the art that; above method and apparatus embodiment is carried out exemplary illustration; more than being not intended as limitation of the present invention, other implementations that those skilled in the art obtain under not paying creative work belong to protection scope of the present invention.
It should be noted that, in this article, the relational terms of such as first and second or the like is used merely to separate an entity or operation with another entity or operating space, and not necessarily requires or imply the relation that there is any this reality between these entities or operation or sequentially.And, term " includes ", " comprising " or its any other variant are intended to comprising of nonexcludability, so that include the process of a series of key element, method, article or equipment not only include those key elements, but also include other key elements being not expressly set out, or also include the key element intrinsic for this process, method, article or equipment.When there is no more restriction, statement " including ... " key element limited, it is not excluded that there is also other identical element in including the process of described key element, method, article or equipment.The present invention can described in the general context of computer executable instructions, for instance program module.Usually, program module includes performing particular task or realizing the routine of particular abstract data type, program, object, assembly, data structure etc..The present invention can also be put into practice in a distributed computing environment, in these distributed computing environment, the remote processing devices connected by communication network perform task.In a distributed computing environment, program module may be located in the local and remote computer-readable storage medium including storage device.
Each embodiment in this specification all adopts the mode gone forward one by one to describe, between each embodiment identical similar part mutually referring to, what each embodiment stressed is the difference with other embodiments.Especially for device embodiment, owing to it is substantially similar to embodiment of the method, so describing fairly simple, relevant part illustrates referring to the part of embodiment of the method.Device embodiment described above is merely schematic, the wherein said unit illustrated as separating component can be or may not be physically separate, the parts shown as unit can be or may not be physical location, namely may be located at a place, or can also be distributed on multiple NE.Some or all of module therein can be selected according to the actual needs to realize the purpose of the present embodiment scheme.Those of ordinary skill in the art, when not paying creative work, are namely appreciated that and implement.The above is only the specific embodiment of the present invention; it should be pointed out that, for those skilled in the art, under the premise without departing from the principles of the invention; can also making some improvements and modifications, these improvements and modifications also should be regarded as protection scope of the present invention.
Claims (14)
1. an information matches processing method, it is characterised in that described method includes:
Obtain each search key word and product information, and described each search key word and product information are formed search key word and product information feature pair between two;
Calculate the dependency of each described search key word and product information feature pair, determine the dependency gear of each described search key word and product information feature pair according to correlativity calculation result;
What calculate each described search key word and product information feature pair estimates clicking rate, utilize quantile determine with each described search key word and product information feature pair estimate clicking rate corresponding estimate clicking rate gear;
Determining the scoring of each described search key word and product information feature pair according to described dependency gear and described clicking rate gear of estimating, described scoring is for characterizing the matching degree of described search key word and product information.
2. method according to claim 1, it is characterised in that the clicking rate of estimating of each described search key word of described calculating and product information feature pair includes:
To described search key word and product information feature to carrying out feature extraction, obtain each feature characteristic of correspondence weight according to training pattern;
The feature extracted and described feature characteristic of correspondence weight calculation is utilized to estimate clicking rate.
3. method according to claim 2, it is characterized in that, the feature of described extraction include set forth below in one or arbitrarily combine: the dependency of the text message of described search key word, the category information of described search key word, the title of described product information, the attribute of described product information, described search key word and described product information.
4. method according to claim 1, it is characterised in that described utilize quantile to determine the clicking rate gear of estimating estimating clicking rate corresponding with each described search key word and product information feature pair includes:
Estimate the proportionality coefficient that each gear of clicking rate gear is corresponding;
The numerical value of quantile is determined according to described proportionality coefficient;
The numerical value estimating clicking rate and described quantile according to described each described search key word and product information feature pair determine described in estimate the gear at clicking rate place interval.
5. method according to claim 4, it is characterised in that described quantile is normal distribution quantile.
6. method according to claim 1, it is characterised in that calculate the dependency of each described search key word and product information feature pair, determines that according to correlativity calculation result the dependency gear of each described search key word and product information feature pair includes:
The matching judgment to carrying out various features by described search key word and product information feature;
Matching judgment result according to described various features, it is determined that the dependency gear of described search key word and product information feature pair.
7. method according to claim 6, it is characterised in that the matching judgment of described various features includes: classification characteristic matching judges and text feature matching judgment is at least one;
Described classification characteristic matching is judged as judging whether described search key word and product information belong to same classification;
Described text feature matching judgment is whether the content of text judging described search key word and product information is associated.
8. an information matches processes device, it is characterised in that described device includes:
Acquiring unit, is used for obtaining each search key word and product information, and described each search key word and product information is formed search key word and product information feature pair between two;
Dependency gear determines unit, for calculating the dependency of each described search key word and product information feature pair, determines the dependency gear of each described search key word and product information feature pair according to correlativity calculation result;
Estimate clicking rate gear and determine unit, estimate clicking rate for what calculate each described search key word and product information feature pair, utilize quantile determine with each described search key word and product information feature pair estimate clicking rate corresponding estimate clicking rate gear;
Matching determines unit, and for determining the scoring of each described search key word and product information feature pair according to described dependency gear and described clicking rate gear of estimating, described scoring is for characterizing the matching degree of described search key word and product information.
9. device according to claim 8, it is characterised in that described in estimate clicking rate gear and determine that unit includes estimating clicking rate computation subunit and gear determines subelement, wherein, described in estimate clicking rate computation subunit and include:
Subelement set up by model, for described search key word and product information feature to carrying out feature extraction, obtain each feature characteristic of correspondence weight according to training pattern;
Computation subunit, for utilizing the feature of extraction and described feature characteristic of correspondence weight calculation to estimate clicking rate.
10. device according to claim 9, it is characterized in that, described model set up the feature that subelement extracts include set forth below in one or arbitrarily combine: the dependency of the text message of described search key word, the category information of described search key word, the title of described product information, the attribute of described product information, described search key word and described product information.
11. device according to claim 8, it is characterised in that described in estimate clicking rate gear and determine that unit includes estimating clicking rate computation subunit and gear determines subelement, wherein, described gear determines that subelement includes:
Proportionality coefficient determines subelement, for estimating the proportionality coefficient that each gear of clicking rate gear is corresponding;
Quantile determines subelement, for determining the numerical value of quantile according to described proportionality coefficient;
Subelement is determined in gear interval, and the gear for estimating clicking rate place according to the numerical value estimating clicking rate and described quantile of described each described search key word and product information feature pair described in determining is interval.
12. device according to claim 11, it is characterised in that described quantile is normal distribution quantile.
13. device according to claim 8, it is characterised in that described dependency gear determines that unit includes:
Characteristic matching subelement, for the matching judgment to carrying out various features by described search key word and product information feature;
Determine subelement, for the matching judgment result according to described various features, it is determined that the dependency gear of described search key word and product information feature pair.
14. device according to claim 13, it is characterised in that the matching judgment of the various features that described characteristic matching subelement carries out includes: classification characteristic matching judges and text feature matching judgment is at least one;
Described classification characteristic matching is judged as judging whether described search key word and product information belong to same classification;
Described text feature matching judgment is whether the content of text judging described search key word and product information is associated.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410838112.4A CN105808541B (en) | 2014-12-29 | 2014-12-29 | A kind of information matches treating method and apparatus |
PCT/CN2015/098247 WO2016107455A1 (en) | 2014-12-29 | 2015-12-22 | Information matching processing method and apparatus |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410838112.4A CN105808541B (en) | 2014-12-29 | 2014-12-29 | A kind of information matches treating method and apparatus |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105808541A true CN105808541A (en) | 2016-07-27 |
CN105808541B CN105808541B (en) | 2019-11-08 |
Family
ID=56284233
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410838112.4A Active CN105808541B (en) | 2014-12-29 | 2014-12-29 | A kind of information matches treating method and apparatus |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN105808541B (en) |
WO (1) | WO2016107455A1 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106649605A (en) * | 2016-11-28 | 2017-05-10 | 百度在线网络技术(北京)有限公司 | Triggering way and device of promoting key words |
CN107767172A (en) * | 2017-10-12 | 2018-03-06 | 百度在线网络技术(北京)有限公司 | Information-pushing method, device, server and medium |
CN110516033A (en) * | 2018-05-04 | 2019-11-29 | 北京京东尚科信息技术有限公司 | A kind of method and apparatus calculating user preference |
CN110633398A (en) * | 2018-05-31 | 2019-12-31 | 阿里巴巴集团控股有限公司 | Method for confirming central word, searching method, device and storage medium |
CN110909182A (en) * | 2019-11-29 | 2020-03-24 | 北京达佳互联信息技术有限公司 | Multimedia resource searching method and device, computer equipment and storage medium |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111047009B (en) * | 2019-11-21 | 2023-05-23 | 腾讯科技(深圳)有限公司 | Event trigger probability prediction model training method and event trigger probability prediction method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020138481A1 (en) * | 2001-03-23 | 2002-09-26 | International Business Machines Corporation | Searching product catalogs |
US20070016491A1 (en) * | 2003-09-30 | 2007-01-18 | Xuejun Wang | Method and apparatus for search scoring |
CN103514178A (en) * | 2012-06-18 | 2014-01-15 | 阿里巴巴集团控股有限公司 | Searching and sorting method and device based on click rate |
CN104077306A (en) * | 2013-03-28 | 2014-10-01 | 阿里巴巴集团控股有限公司 | Search engine result sequencing method and search engine result sequencing system |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103729365A (en) * | 2012-10-12 | 2014-04-16 | 阿里巴巴集团控股有限公司 | Searching method and system |
CN103778548B (en) * | 2012-10-19 | 2018-05-29 | 阿里巴巴集团控股有限公司 | Merchandise news and key word matching method, merchandise news put-on method and device |
-
2014
- 2014-12-29 CN CN201410838112.4A patent/CN105808541B/en active Active
-
2015
- 2015-12-22 WO PCT/CN2015/098247 patent/WO2016107455A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020138481A1 (en) * | 2001-03-23 | 2002-09-26 | International Business Machines Corporation | Searching product catalogs |
US20070016491A1 (en) * | 2003-09-30 | 2007-01-18 | Xuejun Wang | Method and apparatus for search scoring |
CN103678481A (en) * | 2003-09-30 | 2014-03-26 | 雅虎公司 | Method and apparatus for search scoring |
CN103514178A (en) * | 2012-06-18 | 2014-01-15 | 阿里巴巴集团控股有限公司 | Searching and sorting method and device based on click rate |
CN104077306A (en) * | 2013-03-28 | 2014-10-01 | 阿里巴巴集团控股有限公司 | Search engine result sequencing method and search engine result sequencing system |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106649605A (en) * | 2016-11-28 | 2017-05-10 | 百度在线网络技术(北京)有限公司 | Triggering way and device of promoting key words |
CN106649605B (en) * | 2016-11-28 | 2020-09-29 | 百度在线网络技术(北京)有限公司 | Method and device for triggering promotion keywords |
CN107767172A (en) * | 2017-10-12 | 2018-03-06 | 百度在线网络技术(北京)有限公司 | Information-pushing method, device, server and medium |
CN110516033A (en) * | 2018-05-04 | 2019-11-29 | 北京京东尚科信息技术有限公司 | A kind of method and apparatus calculating user preference |
CN110633398A (en) * | 2018-05-31 | 2019-12-31 | 阿里巴巴集团控股有限公司 | Method for confirming central word, searching method, device and storage medium |
CN110909182A (en) * | 2019-11-29 | 2020-03-24 | 北京达佳互联信息技术有限公司 | Multimedia resource searching method and device, computer equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
WO2016107455A1 (en) | 2016-07-07 |
CN105808541B (en) | 2019-11-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105808541A (en) | Information matching processing method and apparatus | |
CN103207899B (en) | Text recommends method and system | |
US10095782B2 (en) | Summarization of short comments | |
US9934293B2 (en) | Generating search results | |
US20130339350A1 (en) | Ranking Search Results Based on Click Through Rates | |
CN103049470B (en) | Viewpoint searching method based on emotion degree of association | |
US20160026727A1 (en) | Generating additional content | |
US20140108200A1 (en) | Method and system for recommending search phrases | |
CN103365839A (en) | Recommendation search method and device for search engines | |
CN103606097A (en) | Method and system based on credibility evaluation for product information recommendation | |
CN105468649B (en) | Method and device for judging matching of objects to be displayed | |
CN104462327B (en) | Calculating, search processing method and the device of statement similarity | |
EP2473936A1 (en) | Information retrieval based on semantic patterns of queries | |
CN102682001A (en) | Method and device for determining suggest word | |
CN102495864A (en) | Collaborative filtering recommending method and system based on grading | |
CN105023178B (en) | A kind of electronic commerce recommending method based on ontology | |
CN104408033A (en) | Text message extracting method and system | |
CN103425650A (en) | Recommendation searching method and recommendation searching system | |
CN103345489A (en) | Event inquiry demand processing method and device | |
CN105095625A (en) | Click Through Ratio (CTR) prediction model establishing method and device, information providing method and information providing system | |
CN103353865B (en) | Barter electronic trading commodity recommendation method based on position | |
CN103136213A (en) | Method and device for providing related words | |
CN104317881A (en) | Method for reordering microblogs on basis of authorities of users' topics | |
CN105786810A (en) | Method and device for establishment of category mapping relation | |
CN105550282A (en) | User interest forecasting method by utilizing multidimensional data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20240226 Address after: # 01-21, Lai Zan Da Building 1, 51 Belarusian Road, Singapore Patentee after: Alibaba Singapore Holdings Ltd. Country or region after: Singapore Address before: Cayman Islands Grand Cayman capital building, a four storey No. 847 mailbox Patentee before: ALIBABA GROUP HOLDING Ltd. Country or region before: Cayman Islands |