CN105808541B - A kind of information matches treating method and apparatus - Google Patents
A kind of information matches treating method and apparatus Download PDFInfo
- Publication number
- CN105808541B CN105808541B CN201410838112.4A CN201410838112A CN105808541B CN 105808541 B CN105808541 B CN 105808541B CN 201410838112 A CN201410838112 A CN 201410838112A CN 105808541 B CN105808541 B CN 105808541B
- Authority
- CN
- China
- Prior art keywords
- product information
- gear
- feature
- search keyword
- described search
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims description 38
- 238000012545 processing Methods 0.000 claims abstract description 13
- 238000004364 calculation method Methods 0.000 claims abstract description 11
- 238000003672 processing method Methods 0.000 claims abstract description 5
- 238000009826 distribution Methods 0.000 claims description 48
- 238000000605 extraction Methods 0.000 claims description 12
- 238000012549 training Methods 0.000 claims description 6
- 239000000284 extract Substances 0.000 claims description 3
- 239000000203 mixture Substances 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 5
- 230000003993 interaction Effects 0.000 description 5
- 238000004519 manufacturing process Methods 0.000 description 3
- 238000005192 partition Methods 0.000 description 3
- 238000009434 installation Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000003860 storage Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to data processing field, especially a kind of information matches processing method, which comprises obtain each search key and product information, and each search key and product information are formed into search key and product information feature pair two-by-two;The correlation for calculating each described search keyword and product information feature pair determines the correlation gear of each described search keyword and product information feature pair according to correlativity calculation result;Calculate each described search keyword and product information feature pair estimates clicking rate, determining with each described search keyword and product information feature pair estimates that clicking rate is corresponding to estimate clicking rate gear using quantile;According to the correlation gear and the scoring estimated clicking rate gear and determine each described search keyword and product information feature pair, the matching degree to score for characterizing described search keyword and product information.
Description
Technical field
The present invention relates to technical field of data processing, more particularly to a kind of information matches treating method and apparatus.
Background technique
With the development of computer and Internet technology, e-commerce website is rapidly developed.In e-commerce
The data or product of magnanimity are typically stored in website, in order to improve the efficiency that user searches for product of interest, website service
The search term that device is often inputted according to user is recommended and the matched product of described search word to user.Recommended to the user
In the matched product of search term, some products that are high with search term matching degree, high-quality and having carried out advertisement promotion are often
By preferential recommendation to user.And seller often selects high-quality product to carry out advertisement promotion to improve the sales volume of the product.
It when seller carries out advertisement promotion, needs to buy corresponding search key for the product information of publication, if the production of seller's publication
Product information and the matching degree of search key are higher, and the probability that product is searched for by user is then bigger, and buyer user is also more likely to
Find with the matched product of search term, so as to get useful information in information ocean.
Therefore, the matching degree of accurate judgement product information and search term not only can be improved seller user and promote product
Validity can also reduce the data interaction that buyer user searches for product bring client and server repeatedly, improve user
Experience, while promoting the performance of server.
It is of the existing technology judgement product information and search term matching degree method, often by calculate search term with
The correlation of advertised product judges the matching degree of search term and release product information according to the relevance scores, recommends seller
Buy the high search key of matching degree.
However, this method of the existing technology, only considers the correlation of search term and advertised product, and do not consider wide
Product is accused by the degree of user preference, therefore the matching thus calculated is inaccurate.The matching calculated result of inaccuracy is not
It only results in seller to fail effectively to promote its product, the product for yet causing website to be recommended to buyer user is not and its demand, emerging
The product of interest exact matching, buyer, which has to retrieve repeatedly, can get its really interested product, to increase
The data interaction of client and server, increases the data processing load of server, reduces the place of server where user
Rationality energy, and seriously occupy valuable Internet bandwidth resource.
Summary of the invention
In order to solve the above technical problems, information can be improved the invention discloses a kind of information matches treating method and apparatus
Matched objectivity and accuracy, improve user experience, reduce the data processing load of server, improve the place of server
Rationality energy saves valuable Internet bandwidth resource.
Technical solution is as follows:
According to a first aspect of the embodiments of the present invention, a kind of product information matched processing method, the method packet are disclosed
It includes:
Each search key and product information are obtained, and each search key and product information are formed into search two-by-two
Keyword and product information feature pair;
The correlation for calculating each described search keyword and product information feature pair determines each according to correlativity calculation result
The correlation gear of described search keyword and product information feature pair;
Calculate each described search keyword and product information feature pair estimates clicking rate, utilizes quantile determining and each institute
That states search key and product information feature pair estimates that clicking rate is corresponding to estimate clicking rate gear;
Each described search keyword and product information are determined according to the correlation gear and the clicking rate gear of estimating
The scoring of feature pair, the matching degree to score for characterizing described search keyword and product information.
According to a second aspect of the embodiments of the present invention, a kind of product information matching treatment device, described device packet are disclosed
It includes:
Acquiring unit is believed for obtaining each search key and product information, and by each search key and product
Breath forms search key and product information feature pair two-by-two;
Correlation gear determination unit, for calculating the correlation of each described search keyword and product information feature pair,
The correlation gear of each described search keyword and product information feature pair is determined according to correlativity calculation result;
Clicking rate gear determination unit is estimated, for calculating each described search keyword and product information feature to estimating
Clicking rate, it is determining with each described search keyword and product information feature pair estimates that clicking rate is corresponding to be estimated using quantile
Clicking rate gear;
Matching determination unit, for determining each described search according to the correlation gear and the clicking rate gear of estimating
The scoring of rope keyword and product information feature pair, the matching scored for characterizing described search keyword and product information
Degree.
What the one aspect of the embodiment of the present invention can reach has the beneficial effect that method and apparatus provided by the invention, In
When determining the matching degree of search key and product information, correlation of the search key with product information is not only allowed for,
Degree of the product by user preference is also contemplated, product can be objectively responded by, which introducing, estimates click by the degree of user preference
The rate factor carries out estimating clicking rate calculating, and determines the advertisement also according to preset ratio rules (for example, normal distribution law)
Clicking rate gear corresponding to the probability that product is clicked under the search key by user, by correlation gear and clicking rate shelves
The comprehensive matching degree for determining search key and product information in position, to obtain more accurate matching result.As a result, not
The validity that seller user promotes product only can be improved, buyer user can also be reduced and search for product bring client repeatedly
With the data interaction of server, user experience is improved, the data processing load of server is reduced, improves the treatability of server
Can, save valuable Internet bandwidth resource.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
The some embodiments recorded in invention, for those of ordinary skill in the art, without creative efforts,
It is also possible to obtain other drawings based on these drawings.
Fig. 1 is a kind of information matches processing method flow diagram provided in an embodiment of the present invention;
Fig. 2 is that standardized normal distribution quartile provided in an embodiment of the present invention indicates to be intended to;
Fig. 3 estimates clicking rate gear distribution schematic diagram to be provided in an embodiment of the present invention;
Fig. 4 is information matches processing unit schematic diagram provided in an embodiment of the present invention.
Specific embodiment
Technical solution in order to enable those skilled in the art to better understand the present invention, below in conjunction with of the invention real
The attached drawing in example is applied, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described implementation
Example is only a part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, this field is common
Technical staff's every other embodiment obtained without making creative work, all should belong to protection of the present invention
Range.
The invention discloses a kind of information matches treating method and apparatus, not only allow for search key and product information
Correlation, it is also contemplated that degree of the product by user preference introduces and is able to reflect product by the pre- of the degree of user preference
Estimate the clicking rate factor to carry out estimating clicking rate calculating, and determines the advertised product in the search key according to normal distribution law
Clicking rate gear corresponding to the lower probability clicked by user determines that search is crucial by correlation gear and clicking rate gear are comprehensive
The matching degree of word and product information, to obtain more accurate matching result.
In a kind of application scenarios of the invention, in e-commerce website, seller needs to buy search key to push away
Its wide advertised product, method provided in an embodiment of the present invention can be applied to web site server end, for judging search key
The matching degree for the product information issued with seller, thus to the search key that seller recommends purchase matching degree high, to improve
Seller user promotes the validity of product, further increases the probability that seller's consumer products are clicked by buyer user;On the other hand,
It can also be improved the efficiency that buyer user searches for product, reduce buyer user and search for product bring client and server repeatedly
Data interaction, improve user experience, reduce the data processing load of server, improve the process performance of server, save
Valuable Internet bandwidth resource.
It is a kind of information matches processing method flow diagram provided in an embodiment of the present invention referring to Fig. 1.
S101, obtains each search key and product information, and by each search key and product information group two-by-two
At search key and product information feature pair.
For seller, manage product be it is various, different classifications may be belonged to, at this moment, can be with needle
The product information of seller is respectively processed, the words of its product information can be described by obtaining one or more, and with search
Keyword forms search key and product information feature pair two-by-two.For example, the product information of seller include MP3 player,
Iphone6, Note4, earphone etc..Search key is mobile phone, then the search key and product information feature formed is to just packet
(mobile phone, MP3 player) is included, (mobile phone, iphone6), (mobile phone, Note4), (mobile phone, earphone).Certainly, the above is only examples
Property explanation, be not intended as limitation of the present invention.Wherein, the product information is specifically as follows advertised product information.
It should be noted that before executing step S102 and step S103, it can be to each search key and production
Product information is pre-processed, and the pretreatment includes the extraction processing of semantic feature needed for carrying out various features matching.Specifically
The mode of processing can be multiplicity, herein without limiting.
In addition, there is no the successive of certainty to execute sequence between step S102 and step S103, the two can concurrently be held
Row, can also reversedly execute.
S102 calculates the correlation of each described search keyword and product information feature pair, according to correlativity calculation result
Determine the correlation gear of each described search keyword and product information feature pair.
Wherein, the calculating of correlation is mainly related to the classification correlation and text of advertised product by search key
Property obtains.Wherein, classification correlation refers to the matching degree for clicking classification and advertised product place classification of search key;Text
This correlation include various aspects, be primarily referred to as search key core word and advertised product title core word matching degree with
And the attributes match degree in the attribute occurred in search key and advertised product description, comprehensive classification matching are with text matches
Relevance scores can be obtained.
When specific implementation, step S102 be can specifically include: by described search keyword and product information feature to progress
The matching judgment of various features;According to the matching judgment of the various features as a result, determining described search keyword and product letter
Cease the correlation gear of feature pair.
When specific implementation, when carrying out correlation calculations, described search keyword and product information feature are every to carrying out
The matching judgment of feature: both the judgement of classification characteristic matching and text feature matching judgment are at least one.
Further, the classification characteristic matching is judged as that judge whether described search keyword and product information belong to same
Classification.In the present invention one in the specific implementation, classification characteristic matching judgement is often referred to the classification carried out according to text meaning
Judgement.If described search keyword classification is identical with the classification of release product information, then the result of classification characteristic matching judgement is
"Yes", otherwise, the result that classification characteristic matching judges are "No".Wherein, the result that classification characteristic matching judges is the one of "No"
Kind special circumstances are that described search keyword does not have classification, and the search key for not classification is usually that its long-tail is tighter
Weight, the long-tail are the search key seldom searched for by user.For example, described search keyword is " mp3 ", and release product
For " audio player ", then the two belongs to same classification, and the result that classification characteristic matching judges is "Yes".Described search keyword
For " mp3 ", and release product is " radio ", then both is not belonging to same classification, the result of classification characteristic matching judgement is
"No".
Further, the text feature matching judgment is to judge in described search keyword and the text of release product information
Whether hold is associated.Specifically, text feature matching judgment of the present invention include: exact matching judgement, part matching judgment,
Centre word matching judgment, is hidden at least one in word matching judgment and reversed preposition matching judgment at centre word exact matching judgement
Kind.Certainly, text feature matching judgment can also include extract Text eigenvector, using cosine angle formulae calculate text to
The method of the similitude of amount.The invention does not limit this.
According to search key and product information feature to the matching judgment for carrying out various features after, it can according to institute
The matching judgment of various features is stated as a result, determining the correlation gear of described search keyword and product information feature pair.At this
In invention, correlation gear is divided into excellent poor third gear.
As shown in table 1, the one kind divided for correlation gear schematically illustrates, and can also be divided certainly using other gears
Method, herein without limiting.
Table 1
S103, calculate each described search keyword and product information feature pair estimates clicking rate, and quantile is utilized to determine
Estimate that clicking rate is corresponding to estimate clicking rate gear with each described search keyword and product information feature pair.
When specific implementation, step S103 may include: to estimate the corresponding ratio system of each gear of clicking rate gear
Number;The numerical value of quantile is determined according to the proportionality coefficient;According to each described search keyword and product information feature pair
The numerical value for estimating clicking rate and the quantile determine described in estimate gear section where clicking rate.
Preferably, the quantile is normal distribution quantile.
It is described in detail below with reference to an example.
Standardized normal distribution quantile is introduced first.Standardized normal distribution is also known as Gaussian Profile, be with 0 is equal
Number, with 1 for standard deviation normal distribution, be denoted as N (0,1), be one and bell probability distribution curve is presented, both ends are small, in
Between it is big, the gross area under curve is 1, is defined as: if stochastic variable X is obeyed, a location parameter is μ, scale parameter is σ's
Probability distribution is denoted as:
X~N (μ, σ2) (1)
Its probability density function is
It is referred to as average that then f, which obeys 0, and 1 is the standardized normal distribution of standard deviation.
Normal distribution quantile for portraying the rule that the area under the curve under normal distribution meets, standardized normal distribution it is upper
The definition of α quantile: setting X~N (0,1), and for appointing the α given, (0 < α < 1), title meets P (X > ZaThe point Z of)=αaJust for standard
The upper α quantile of state distribution.Gaussian distribution table schematic diagram shown in Fig. 2 is such as looked into, Z is worked asa=1, find α=0.158655.
The common quantile of normal distribution has following rule:
68.268949% area is within the scope of a standard deviation of average or so under function curve.
95.449974% area is in the range of two 2 σ of standard deviation of average or so.
99.730020% area is in the range of three 3 σ of standard deviation of average or so.
99.993666% area is in the range of four 4 σ of standard deviation of average or so.
The present invention is exactly the gear division for applying normal distribution law and estimate clicking rate.
Wherein, estimating clicking rate eCTR is by establishing mathematical probabilities mould to historical multiple exposure and click behavior
Type, and predicted by the model whether following exposure generates click, the value finally provided refers in some word
Under, therefore the probability clicked after the exposure of some product by user is the value between one 0~1, the more big then explanation of value is by point
It is bigger to hit possibility.
The LR model of eCTR estimated using industrywide standard, LR model include two parts of feature extraction and model training.
Wherein, calculate each described search keyword and product information feature pair estimate clicking rate include: to described search keyword and
Product information feature obtains the corresponding feature weight of each feature according to training pattern to feature extraction is carried out;Utilize extraction
Clicking rate is estimated in feature and the corresponding feature weight calculating of the feature.
Wherein, the feature of feature extraction includes one of set forth below or any combination: the text of described search keyword
This information, the category information of described search keyword, the title of the product information, the product information attribute, described search
The correlation of rope keyword and the product information.
Then, after obtaining feature weight by model training, so that it may estimate advertisement estimating to (Query, offer)
Clicking rate eCTR.Wherein, Query is search key, and offer is product information.
LR model belongs to generalized linear model, it is that linear model changes by Logistic formula and obtained, specific such as table
Up to formula are as follows:
Wherein, wiIt is characterized weight, fiBe characterized value, y be finally calculate estimate clicking rate, formula limits final result
It is set between (0,1), just matches with click probability.
Theoretically, Gauss normal distribution should be met by estimating accurate eCTR, using keyword and global dimension to advertisement
Pair eCTR divide gear, the eCTR of each advertisement pair, Qi Dinghui are fallen on the correspondence section of whole eCTR distribution, which is
Determine that the advertisement estimates clicking rate gear to corresponding.Clicking rate gear division methods are estimated according to provided by the invention,
Can guarantee that the scoring of the advertised product of major part client be in average level, the advertised product of fraction client be in it is preferable or compared with
The level of difference.
In embodiments of the present invention, according to practical business analysis and it is empirically determined, determination will estimate clicking rate gear division
Preferably, in, it is 3 grades poor, the corresponding proportionality coefficient of each gear is respectively 3:4:3, i.e. the advertised product proportion of gear preferably
Be 30%, gear be in advertised product proportion be 40%, gear be difference advertised product proportion be 30%, respectively
Corresponding scoring is 5 stars, 4 stars and 3 stars.Specifically referring to figure 3., schematic diagram is divided to estimate clicking rate gear.Wherein, abscissa
To estimate clicking rate value, ordinate is the frequency, and area under the curve corresponds to probability (i.e. ratio value).
When specific implementation, when the ratio cut partition according to 3:4:3 is global or keyword dimension estimates clicking rate eCTR distribution,
It is required that deviateing distribution area under a certain range of curve of average is 0.4, two sides are then respectively 0.3, according to just due to symmetric relation
The rule that state is distributed common quantile can obtain:
Wherein, μ is average, and σ is standard deviation, ZaFor normal distribution quantile.
That is, after the corresponding proportionality coefficient of each gear of clicking rate gear is estimated in determination, it can according to the ratio
Example coefficient determines the numerical value of normal distribution quantile.
Assuming that Fig. 3 obeys standardized normal distribution, i.e. X~N (0,1), for appointing the α given, (0<α<1), title meets P (X>Z α)
The point Z α of=α is the upper α quantile of standardized normal distribution, the corresponding lower α quantile of Z (1- α).
Z α is a numerical value, when X~N (0,1), then P (X > Z α)=α.Citing is illustrated, and is looked in gaussian distribution table
α, correspondence find Z α.Such as look into the value of Z0.025, that is, it needs to look into the corresponding Z value of 1-0.025=0.975, searches normal state shown in Fig. 2
Distribution table, can just find 0.9750 corresponding Z value is 1.96, therefore Z0.025=1.96 looks into the corresponding α in α=1.96 Z in turn
Value, needs first to look into 1.96, corresponding to 0.975,1-0.975=0.025=is α value.
Then as seen from Figure 3, a1 and a2 respectively corresponds two quantiles of standardized normal distribution, is got the bid by Fig. 3
Ratio value, can respectively correspond on Z α 1 and Z α 2, the value of Z α 1 and Z α 2 can be obtained by above method, in standard normal
Under distribution, the corresponding upper α quantile of Z α 1, the corresponding lower α quantile of Z α 2.
When specific implementation, when estimating each gear of clicking rate gear according to the ratio cut partition of 3:4:3, it can be seen that deviate two sides
Distribution area is 0.4 under a certain range of curve of average, and the left and right sides is then respectively 0.3, then in Fig. 3 mark due to symmetric relation
The corresponding right side graph area of a2 quantile is 0.3 in quasi normal distribution quartile figure, that is, looks into Z0,3Value, that is, need to look into 1-0.3=
0.7 corresponding Z value.It is available to look into normal distribution quartile table shown in Fig. 2,0.7 corresponding Z value is 0.52, then Z0,3=
0.52, i.e. a2 are 0.52;Similarly, the value that can determine a1 is -0.52.A2 and a1 then respectively corresponds the ratio in normal distribution
Under two quantiles.The value of normal distribution quantile Z α 1 and Z α 2 can certainly be calculated according to formula (4).Since Fig. 3 is full
Therefore sufficient standardized normal distribution quantile has X~N (0,1), i.e. μ is equal to 0, σ and is equal to 1, is calculated by formula (4), Za=
± 0.5, corresponding diagram 3, i.e. a1=-0.5, a2=0.5.
The value for estimating clicking rate meets general normal distribution law.Corresponding to general normal distribution, (μ is not equal to 0, σ
In the case where 1), corresponding quantile can then be obtained by the regular approximation of normal distribution quantile, general normal distribution
The quantile of ratio 3:4:3 is corresponded to so as to obtain following formula:
Wherein, μ is average, and σ is standard deviation.Wherein, μ and σ can be calculated by real data sample.Specifically
Ground can find out all average value mus for estimating clicking rate and corresponding variances sigma, specifically after acquisition is estimated and clicks rate score
Calculation method is referred to method of the existing technology.Then, it according to average value mu and variances sigma, is obtained according to formula (4)
The numerical value of general normal distribution quantile.
It, then can be according to estimating clicking rate and normal distribution quantile after the numerical value of the general normal distribution quantile of determination
Numerical values recited, determine described in estimate gear section where clicking rate.For example, being found out according to standardized normal distribution quartile table pre-
Estimate clicking rate belong to (0, μ-σ/2] when, it is corresponding estimate clicking rate gear be it is poor;Estimate clicking rate belong to (μ-σ/2, μ+σ/
2) corresponding to estimate during clicking rate gear is when between;Estimate clicking rate belong to [μ+σ/2,1) when, it is corresponding to estimate a little
The rate gear of hitting is preferably.
It should be noted that being illustrated so that proportionality coefficient is 3:4:3 as an example above, when determining proportionality coefficient is other
When ratio, the thought for being referred to the above method is calculated.
S104 determines each described search keyword and product according to the correlation gear and the clicking rate gear of estimating
The scoring of information characteristics pair, the matching degree to score for characterizing described search keyword and product information.
When specific implementation, the circular of scoring can be multiplicity, obtain for example, by using average weighted method
Scoring or other implementations, the invention does not limit this.
It is a kind of implementation of Star rating referring to table 2.
Table 2
Wherein, it is analyzed according to practical business, can select the ratio for the use of difference in good being 3:4:3 is excellent wide to correlation
It accuses to dividing, it is good advertisement to the ratio cut partition according to 1:1 for correlation that corresponding, which is 5 stars, 4 stars and 3 stars,
Gear respectively corresponds 2 stars and 1 star, and the division of excellent advertisement pair is as shown in table 2, and good advertisement is relatively simple to dividing due to only two grades
It is single, distribution average point is taken, good advertisement centering is 2 stars greater than mean value, and being less than mean value is 1 star.
In embodiments of the present invention, correlation calculations are combined and estimate clicking rate and calculate search key and advertised product
Matching degree, not only how inform seller's user advertising quality and matching degree, can also objectively respond buyer user in website
The probability that the advertised product is clicked by buyer when searching for product, scoring star is higher, and ranking is more forward, the possibility that buyer clicks
Property it is bigger, bring exposure and feedback will be more so that the rate of return on investment of advertiser is also bigger, improve seller
Promote the validity of product.For the buyer of website, advertiser can bring the promotion of product quality to the optimization of advertisement,
Direct result is exactly that experience of the user in website can become more preferably, and the data interaction of client and server can become where user
It is few, the data processing load of server is reduced, the process performance of server is improved, saves valuable Internet bandwidth resource.
It referring to fig. 4, is product information matching treatment schematic device provided in an embodiment of the present invention.
A kind of product information matching treatment device 400, described device include:
Acquiring unit 401, for obtaining each search key and product information, and by each search key and product
Information forms search key and product information feature pair two-by-two.
Correlation gear determination unit 402 is related to product information feature pair for calculating each described search keyword
Property, the correlation gear of each described search keyword and product information feature pair is determined according to correlativity calculation result.
Clicking rate gear determination unit 403 is estimated, for calculating each described search keyword and product information feature pair
Estimate clicking rate, using quantile it is determining with each described search keyword and product information feature pair to estimate clicking rate corresponding
Estimate clicking rate gear.
Matching determination unit 404, for determining each institute according to the correlation gear and the clicking rate gear of estimating
State the scoring of search key and product information feature pair, the scoring is for characterizing described search keyword and product information
Matching degree.
Further, the clicking rate gear determination unit of estimating includes estimating clicking rate computation subunit and gear determination
Subelement, wherein the clicking rate computation subunit of estimating includes:
Model foundation subelement, for described search keyword and product information feature to carrying out feature extraction, according to
Training pattern obtains the corresponding feature weight of each feature;
Computation subunit, for using extraction feature and the feature corresponding feature weight calculating estimate click
Rate.
Further, the feature that the model foundation subelement extracts includes one of set forth below or any combination:
Title, the product of the text information of described search keyword, the category information of described search keyword, the product information
The correlation of the attribute of information, described search keyword and the product information.
Further, the clicking rate gear determination unit of estimating includes estimating clicking rate computation subunit and gear determination
Subelement, wherein the gear determines that subelement includes:
Proportionality coefficient determines subelement, for estimating the corresponding proportionality coefficient of each gear of clicking rate gear;
Quantile determines subelement, for determining the numerical value of quantile according to the proportionality coefficient;
Gear section determines subelement, for according to each described search keyword and product information feature to estimating
The gear section where clicking rate is estimated described in the numerical value of clicking rate and the quantile is determining.
Wherein, the quantile is normal distribution quantile.
Further, the correlation gear determination unit includes:
Characteristic matching subelement, for the matching by described search keyword and product information feature to various features are carried out
Judgement;
Subelement is determined, for the matching judgment according to the various features as a result, determining described search keyword and production
The correlation gear of product information characteristics pair.
Further, the matching judgment for the various features that the characteristic matching subelement carries out includes: classification characteristic matching
Both judgement and text feature matching judgment are at least one;
The classification characteristic matching, which is judged as, judges whether described search keyword and product information belong to same classification;
The text feature matching judgment is to judge whether described search keyword is related to the content of text of product information
Connection.
The function of above-mentioned each unit can correspond to the processing step of the above method of Fig. 1 detailed description, repeat no more in this.
It should be noted that since embodiment of the method being explained in detail, this field relatively simple to the description of Installation practice
Technical staff constructs the device of the invention embodiment it is understood that being referred to embodiment of the method.Those skilled in the art
It is all belonged to the scope of protection of the present invention in other implementations for not making the creative labor lower acquisition.
It will be appreciated by persons skilled in the art that exemplary illustration has been carried out to method and Installation practice above, with
On be not intended as limitation of the present invention, those skilled in the art are equal in other implementations for not making the creative labor lower acquisition
It belongs to the scope of protection of the present invention.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality
Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation
In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to
Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those
Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment
Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that
There is also other identical elements in process, method, article or equipment including the element.The present invention can be by calculating
The general described in the text, such as program module up and down for the computer executable instructions that machine executes.Generally, program module includes holding
The routine of row particular task or realization particular abstract data type, programs, objects, component, data structure etc..It can also divide
Cloth, which calculates, practices the present invention in environment, in these distributed computing environments, by connected long-range by communication network
Processing equipment executes task.In a distributed computing environment, program module can be located at the local including storage equipment
In remote computer storage medium.
All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment
Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for device reality
For applying example, since it is substantially similar to the method embodiment, so describing fairly simple, related place is referring to embodiment of the method
Part explanation.The apparatus embodiments described above are merely exemplary, wherein described be used as separate part description
Unit may or may not be physically separated, component shown as a unit may or may not be
Physical unit, it can it is in one place, or may be distributed over multiple network units.It can be according to the actual needs
Some or all of the modules therein is selected to achieve the purpose of the solution of this embodiment.Those of ordinary skill in the art are not paying
In the case where creative work, it can understand and implement.The above is only a specific embodiment of the invention, should be referred to
Out, for those skilled in the art, without departing from the principle of the present invention, can also make several
Improvements and modifications, these modifications and embellishments should also be considered as the scope of protection of the present invention.
Claims (13)
1. a kind of information matches processing method, which is characterized in that the described method includes:
Each search key and product information are obtained, and each search key and product information are formed to search key two-by-two
Word and product information feature pair;
The correlation for calculating each described search keyword and product information feature pair determines each described according to correlativity calculation result
The correlation gear of search key and product information feature pair;
Calculate each described search keyword and product information feature pair estimates clicking rate, described searches using quantile is determining with each
Rope keyword and product information feature pair estimate that clicking rate is corresponding to estimate clicking rate gear;
Each described search keyword and product information feature are determined according to the correlation gear and the clicking rate gear of estimating
Pair scoring, the matching degree to score for characterizing described search keyword and product information;
Wherein, calculate each described search keyword and product information feature pair correlation include: by described search keyword and
Product information feature to carry out various features matching judgment, the matching judgment of various features include classification characteristic matching judgement and
Both text feature matching judgments at least one, the text feature matching judgment includes exact matching judgement, partially matching is sentenced
In disconnected, centre word matching judgment, centre word exact matching judgement, hiding word matching judgment and reversed preposition matching judgment at least
One kind, the text feature matching judgment further include extracting Text eigenvector, calculate text vector using cosine angle formulae
Similitude;
It is determining with each described search keyword and product information feature pair estimates that clicking rate is corresponding to be estimated a little using quantile
Hitting rate gear includes estimating the corresponding proportionality coefficient of each gear of clicking rate gear;It is determined and is divided according to the proportionality coefficient
The numerical value in site;Clicking rate and the quantile are estimated according to each described search keyword and product information feature pair
Numerical value determine described in estimate gear section where clicking rate.
2. the method according to claim 1, wherein described calculate each described search keyword and product information spy
Sign pair clicking rate of estimating include:
To described search keyword and product information feature to feature extraction is carried out, it is corresponding that each feature is obtained according to training pattern
Feature weight;
Clicking rate is estimated using the feature of extraction and the corresponding feature weight calculating of the feature.
3. according to the method described in claim 2, it is characterized in that, the feature of the extraction include it is one of set forth below or
Any combination: the mark of the text information of described search keyword, the category information of described search keyword, the product information
Topic, the correlation of the attribute of the product information, described search keyword and the product information.
4. the method according to claim 1, wherein the quantile is normal distribution quantile.
5. the method according to claim 1, wherein determining that each described search is crucial according to correlativity calculation result
Word and the correlation gear of product information feature pair include:
According to the matching judgment of the various features as a result, determining the correlation of described search keyword and product information feature pair
Gear.
6. according to the method described in claim 5, it is characterized by:
The classification characteristic matching, which is judged as, judges whether described search keyword and product information belong to same classification;
The text feature matching judgment is to judge whether the content of text of described search keyword and product information is associated.
7. a kind of information matches processing unit, which is characterized in that described device includes:
Acquiring unit, for obtaining each search key and product information, and by each search key and product information two
Two composition search keys and product information feature pair;
Correlation gear determination unit, for calculating the correlation of each described search keyword and product information feature pair, according to
Correlativity calculation result determines the correlation gear of each described search keyword and product information feature pair;
Clicking rate gear determination unit is estimated, estimates click for calculate each described search keyword and product information feature pair
Rate, it is determining with each described search keyword and product information feature pair estimates that clicking rate is corresponding to estimate click using quantile
Rate gear;
Matching determination unit, for determining that each described search is closed according to the correlation gear and the clicking rate gear of estimating
The scoring of keyword and product information feature pair, the matching journey to score for characterizing described search keyword and product information
Degree;
Wherein, the correlation gear determination unit includes characteristic matching subelement, is used for described search keyword and product
For information characteristics to the matching judgment for carrying out various features, the matching judgment of various features includes: the judgement of classification characteristic matching and text
Both eigen matching judgments at least one, the text feature matching judgment include exact matching judgement, part matching judgment,
Centre word matching judgment, is hidden at least one in word matching judgment and reversed preposition matching judgment at centre word exact matching judgement
Kind, the text feature matching judgment further includes extracting Text eigenvector, calculates text vector using cosine angle formulae
Similitude;
The clicking rate gear determination unit of estimating includes that gear determines that subelement, the gear determine that subelement includes: ratio
Coefficient determines subelement, for estimating the corresponding proportionality coefficient of each gear of clicking rate gear;Quantile determines that son is single
Member, for determining the numerical value of quantile according to the proportionality coefficient;Gear section determines subelement, for according to described each described
Clicking rate is estimated described in the numerical value determination for estimating clicking rate and the quantile of search key and product information feature pair
The gear section at place.
8. device according to claim 7, which is characterized in that the clicking rate gear determination unit of estimating includes estimating a little
It hits rate computation subunit and gear determines subelement, wherein the clicking rate computation subunit of estimating includes:
Model foundation subelement is used for described search keyword and product information feature to feature extraction is carried out, according to training
Model obtains the corresponding feature weight of each feature;
Computation subunit, for using extraction feature and the feature corresponding feature weight calculating estimate clicking rate.
9. device according to claim 8, which is characterized in that the feature that the model foundation subelement extracts includes following
One of listed or any combination: the category information, described of the text information of described search keyword, described search keyword
The correlation of the title of product information, the attribute of the product information, described search keyword and the product information.
10. device according to claim 7, which is characterized in that the clicking rate gear determination unit of estimating further includes pre-
Estimate clicking rate computation subunit.
11. device according to claim 10, which is characterized in that the quantile is normal distribution quantile.
12. device according to claim 7, which is characterized in that the correlation gear determination unit further include:
Subelement is determined, for the matching judgment according to the various features as a result, determining that described search keyword and product are believed
Cease the correlation gear of feature pair.
13. device according to claim 12, it is characterised in that:
The classification characteristic matching, which is judged as, judges whether described search keyword and product information belong to same classification;
The text feature matching judgment is to judge whether the content of text of described search keyword and product information is associated.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410838112.4A CN105808541B (en) | 2014-12-29 | 2014-12-29 | A kind of information matches treating method and apparatus |
PCT/CN2015/098247 WO2016107455A1 (en) | 2014-12-29 | 2015-12-22 | Information matching processing method and apparatus |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410838112.4A CN105808541B (en) | 2014-12-29 | 2014-12-29 | A kind of information matches treating method and apparatus |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105808541A CN105808541A (en) | 2016-07-27 |
CN105808541B true CN105808541B (en) | 2019-11-08 |
Family
ID=56284233
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410838112.4A Active CN105808541B (en) | 2014-12-29 | 2014-12-29 | A kind of information matches treating method and apparatus |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN105808541B (en) |
WO (1) | WO2016107455A1 (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106649605B (en) * | 2016-11-28 | 2020-09-29 | 百度在线网络技术(北京)有限公司 | Method and device for triggering promotion keywords |
CN107767172A (en) * | 2017-10-12 | 2018-03-06 | 百度在线网络技术(北京)有限公司 | Information-pushing method, device, server and medium |
CN110516033A (en) * | 2018-05-04 | 2019-11-29 | 北京京东尚科信息技术有限公司 | A kind of method and apparatus calculating user preference |
CN110633398A (en) * | 2018-05-31 | 2019-12-31 | 阿里巴巴集团控股有限公司 | Method for confirming central word, searching method, device and storage medium |
CN111047009B (en) * | 2019-11-21 | 2023-05-23 | 腾讯科技(深圳)有限公司 | Event trigger probability prediction model training method and event trigger probability prediction method |
CN110909182B (en) * | 2019-11-29 | 2023-05-09 | 北京达佳互联信息技术有限公司 | Multimedia resource searching method, device, computer equipment and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103514178A (en) * | 2012-06-18 | 2014-01-15 | 阿里巴巴集团控股有限公司 | Searching and sorting method and device based on click rate |
CN103678481A (en) * | 2003-09-30 | 2014-03-26 | 雅虎公司 | Method and apparatus for search scoring |
CN104077306A (en) * | 2013-03-28 | 2014-10-01 | 阿里巴巴集团控股有限公司 | Search engine result sequencing method and search engine result sequencing system |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6728706B2 (en) * | 2001-03-23 | 2004-04-27 | International Business Machines Corporation | Searching products catalogs |
CN103729365A (en) * | 2012-10-12 | 2014-04-16 | 阿里巴巴集团控股有限公司 | Searching method and system |
CN103778548B (en) * | 2012-10-19 | 2018-05-29 | 阿里巴巴集团控股有限公司 | Merchandise news and key word matching method, merchandise news put-on method and device |
-
2014
- 2014-12-29 CN CN201410838112.4A patent/CN105808541B/en active Active
-
2015
- 2015-12-22 WO PCT/CN2015/098247 patent/WO2016107455A1/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103678481A (en) * | 2003-09-30 | 2014-03-26 | 雅虎公司 | Method and apparatus for search scoring |
CN103514178A (en) * | 2012-06-18 | 2014-01-15 | 阿里巴巴集团控股有限公司 | Searching and sorting method and device based on click rate |
CN104077306A (en) * | 2013-03-28 | 2014-10-01 | 阿里巴巴集团控股有限公司 | Search engine result sequencing method and search engine result sequencing system |
Also Published As
Publication number | Publication date |
---|---|
CN105808541A (en) | 2016-07-27 |
WO2016107455A1 (en) | 2016-07-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105808541B (en) | A kind of information matches treating method and apparatus | |
US10270791B1 (en) | Search entity transition matrix and applications of the transition matrix | |
CN106709040B (en) | Application search method and server | |
TWI609278B (en) | Method and system for recommending search words | |
CN103631929B (en) | A kind of method of intelligent prompt, module and system for search | |
CN103593425B (en) | Preference-based intelligent retrieval method and system | |
CN106339502A (en) | Modeling recommendation method based on user behavior data fragmentation cluster | |
CN110020128B (en) | Search result ordering method and device | |
CN105574216A (en) | Personalized recommendation method and system based on probability model and user behavior analysis | |
US20140012840A1 (en) | Generating search results | |
CN102663022B (en) | Classification recognition method based on URL (uniform resource locator) | |
CN105468649B (en) | Method and device for judging matching of objects to be displayed | |
WO2013163062A1 (en) | Recommending keywords | |
Zhong et al. | Time-aware service recommendation for mashup creation in an evolving service ecosystem | |
CN104994424B (en) | A kind of method and apparatus for building audio and video standard data set | |
CN108108380A (en) | Search ordering method, searching order device, searching method and searcher | |
CN103606097A (en) | Method and system based on credibility evaluation for product information recommendation | |
CN103593353A (en) | Information search method and display information sorting weight value determination method and device | |
CN104462327B (en) | Calculating, search processing method and the device of statement similarity | |
US10019513B1 (en) | Weighted answer terms for scoring answer passages | |
CN102289514B (en) | The method of Social Label automatic marking and Social Label automatic marking device | |
CN101820592A (en) | Method and device for mobile search | |
CN104699817B (en) | A kind of method for sequencing search engines and system based on improvement spectral clustering | |
CN103049470A (en) | Opinion retrieval method based on emotional relevancy | |
CN103778122A (en) | Searching method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20240226 Address after: # 01-21, Lai Zan Da Building 1, 51 Belarusian Road, Singapore Patentee after: Alibaba Singapore Holdings Ltd. Country or region after: Singapore Address before: Cayman Islands Grand Cayman capital building, a four storey No. 847 mailbox Patentee before: ALIBABA GROUP HOLDING Ltd. Country or region before: Cayman Islands |