CN104361077A - Creation method and device for web page scoring model - Google Patents

Creation method and device for web page scoring model Download PDF

Info

Publication number
CN104361077A
CN104361077A CN201410638360.4A CN201410638360A CN104361077A CN 104361077 A CN104361077 A CN 104361077A CN 201410638360 A CN201410638360 A CN 201410638360A CN 104361077 A CN104361077 A CN 104361077A
Authority
CN
China
Prior art keywords
webpage
adjusted
feature
scoring model
loss function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410638360.4A
Other languages
Chinese (zh)
Other versions
CN104361077B (en
Inventor
杨燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu Online Network Technology Beijing Co Ltd
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201410638360.4A priority Critical patent/CN104361077B/en
Publication of CN104361077A publication Critical patent/CN104361077A/en
Application granted granted Critical
Publication of CN104361077B publication Critical patent/CN104361077B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Abstract

The embodiment of the invention discloses a creation method and a creation device for a web page scoring model. The method comprises the following steps of acquiring a web page training sample set, wherein the web page training sample set comprises characteristic vectors and mark scores of a plurality of sample web pages under each of at least one preset query term; generating a target loss function according to the mark score of each sample web page in the web page training sample set and at least one pre-determined web page characteristic to be regulated; creating the web page scoring model according to the generated target loss function and the characteristic vectors of each sample web page in the web page training sample set. According to the technical scheme provided by the embodiment of the invention, the accuracy of a web page ranking result can be improved, and the searching experiences of a user can be improved.

Description

The creation method of webpage scoring model and device
Technical field
The embodiment of the present invention relates to field of computer technology, particularly relates to a kind of creation method and device of webpage scoring model.
Background technology
At present, searching products is after the query word receiving user's input, first multiple related web pages that need return can be determined based on this query word, then these related web pages are sorted, finally the link information of all webpages after sorting operation is formed a list, present to user as Search Results.Webpage sorting whether accurate, plays vital effect to the accuracy rate of Search Results and the satisfaction of user to search.The webpage more relevant to query word, its rank should be more forward.
At present, mostly searching products is to be pre-created a webpage scoring model, such as GBRank (Gradient Boosting Rank, gradient promotes sequence) model, then according to this model, the current determined all webpages relevant with a certain query word are given a mark, and then according to the height of giving a mark to these webpage sortings.Wherein, webpage scoring model adopts the method for machine learning, concentrates the feature under the various dimensions of each webpage to learn and the marking rule that obtains to a large amount of training samples.
But, because training sample set has certain limitation, may cause in its learning process some feature learnings of webpage abundant not, thus cause created webpage scoring model not too reasonable, the accuracy rate of ranking results is reduced greatly.Such as, traditional GBRank model there will be the omission feature learning of webpage abundant not, causes the effect in a model of this feature not enough.By this model, the web pages under a certain query word is sorted, as easy as rolling off a log by omit eigenwert less, that correlativity is poor webpage be discharged to omit that eigenwert is comparatively large, before the webpage of good relationship, this can badly influence the search experience of user.
Summary of the invention
The embodiment of the present invention provides a kind of creation method and device of webpage scoring model, can improve the accuracy rate of webpage sorting result, promotes the search experience of user.
First aspect, embodiments provide a kind of creation method of webpage scoring model, the method comprises:
Obtain webpage training sample set, wherein said webpage training sample set comprises gives a mark with marking with the proper vector of the multiple sample web page under each query word at least one query word preset;
Concentrate the mark of this webpage of various kinds to give a mark and at least one webpage predetermined feature to be adjusted according to described webpage training sample, generate target loss function;
Concentrate the proper vector of this webpage of various kinds according to generated target loss function and described webpage training sample, create webpage scoring model.
Second aspect, the embodiment of the present invention additionally provides a kind of creation apparatus of webpage scoring model, and this device comprises:
Webpage training sample acquiring unit, for obtaining webpage training sample set, wherein said webpage training sample set comprises gives a mark with marking with the proper vector of the multiple sample web page under each query word at least one query word preset;
Target loss function generation unit, for concentrating the mark of this webpage of various kinds to give a mark and at least one webpage predetermined feature to be adjusted according to described webpage training sample, generates target loss function;
Webpage scoring model creating unit, for concentrating the proper vector of this webpage of various kinds according to generated target loss function and described webpage training sample, creates webpage scoring model.
The technical scheme that the embodiment of the present invention provides, multiple webpage feature to be adjusted can be pre-determined, then concentrate the mark of this webpage of various kinds to give a mark in conjunction with determined multiple webpage characteristic sum webpage to be adjusted training sample simultaneously, generate target loss function, and then concentrate the proper vector of this webpage of various kinds to create webpage scoring model according to this target loss function and webpage training sample, thus the effect of webpage feature to be adjusted in webpage scoring model can be adjusted, can overcome traditional webpage scoring model there will be the learning process of some features of webpage reasonable not, and then cause the effect of these features in webpage scoring model not enough or act on excessive drawback.Therefore, utilize the webpage scoring model created in the embodiment of the present invention, the multiple webpages under arbitrary query word of input are given a mark, then carries out webpage sorting according to this marking result, the accuracy rate of webpage sorting result can be improved, promote the search experience of user.
Accompanying drawing explanation
Figure 1A is the schematic flow sheet of the creation method of a kind of webpage scoring model that the embodiment of the present invention one provides;
Figure 1B is the application scenarios of the webpage sorting that the creation method of the webpage scoring model that the embodiment of the present invention one provides uses;
Fig. 2 is the schematic flow sheet of the creation method of a kind of webpage scoring model that the embodiment of the present invention two provides;
Fig. 3 is the schematic flow sheet of the creation method of a kind of webpage scoring model that the embodiment of the present invention three provides;
The schematic flow sheet of the creation method of a kind of webpage scoring model that Fig. 4 provides for the embodiment of the present invention four
Figure 5 shows that the structural representation of the creation apparatus of a kind of webpage scoring model that the embodiment of the present invention five provides.
Embodiment
Below in conjunction with drawings and Examples, the present invention is described in further detail.Be understandable that, specific embodiment described herein is only for explaining the present invention, but not limitation of the invention.It also should be noted that, for convenience of description, illustrate only part related to the present invention in accompanying drawing but not entire infrastructure.
Embodiment one
Figure 1A is the schematic flow sheet of the creation method of a kind of webpage scoring model that the embodiment of the present invention one provides.The webpage scoring model utilizing the present embodiment to create, multiple webpages under inputted arbitrary query word can be given a mark, searching products can sort to described multiple webpage according to marking result, and then the link information corresponding to the multiple webpages after sequence is presented to user.Figure 1B is the application scenarios of the webpage sorting that the creation method of the webpage scoring model that the embodiment of the present invention one provides uses.See Figure 1B, the basic procedure of webpage sorting can be divided under line, and two processes predicted by training and line, the input trained under line is webpage training sample set, through learning system output webpage scoring model, and the wherein creation method of the learning system webpage scoring model that adopts the present embodiment to provide.The input that line is predicted is the unsorted multiple webpage under any one query word, and after webpage scoring model is to the marking of these webpages, the size according to marking exports a sequential web page listings.
See Figure 1A, the method that the present embodiment provides can be performed by the creation apparatus of webpage scoring model, specifically comprises following operation:
Operation 110, obtain webpage training sample set, wherein webpage training sample set comprises and giving a mark with the proper vector of the multiple sample web page under each query word at least one query word preset and marking.
In the present embodiment, for creating webpage scoring model, webpage training sample set need be obtained.In this sample set, include proper vector and the mark marking of the multiple sample web page under different query word.The proper vector of arbitrary sample web page is made up of the value that this sample web page is corresponding in multiple web page characteristics of setting respectively.Wherein, described multiple web page characteristics can from multiple angle initializations such as web page contents, web page interrogation word and web page interlinkages, such as, can comprise the quantative attribute of keyword in webpage, the average length feature of word, word frequency against document frequency feature, any number of combination of omitting in feature, webpage level characteristics and webpage degree of belief index characteristic etc.In embodiments of the present invention, it is the query word inputted according to the user corresponding with this webpage that webpage omits feature, and the target search word adopted when searching this webpage, and the proportionate relationship between both obtains.Wherein, target search word is the word obtained after carrying out the omission operation of part lexical item to the query word of user's input.It is less that webpage omits feature, and illustrate that the lexical item dispensed is more, the possibility of query word generation escape is also higher.The mark marking of each sample web page can be in advance by artificial or automatic mode, according to sample web page content and/or relevant query word, to the marking result done by this sample web page.
Such as, the webpage training sample set obtained comprises m the sample web page corresponding respectively with each query word in n query word, specifically comprises:
Q 1, for the set be made up of the proper vector of m sample web page under the 1st query word; Q 1, for the set formed of being given a mark by the mark of m sample web page under the 1st query word;
Q 2, for the set be made up of the proper vector of m sample web page under the 2nd query word; Q 2, for the set formed of being given a mark by the mark of m sample web page under the 2nd query word;
… …
Q n, for the set be made up of the proper vector of the sample web page of the m under the n-th query word; Q n, for set of being given a mark by the mark of m sample web page under the 1st query word.
Operate 120, concentrate according to webpage training sample the mark of this webpage of various kinds to give a mark and at least one webpage predetermined feature to be adjusted, generate target loss function.
Operate 130, concentrate the proper vector of this webpage of various kinds according to generated target loss function and webpage training sample, create webpage scoring model.
In the present embodiment, can the mark marking of this webpage of various kinds be concentrated to generate primary loss function according to webpage training sample in advance, then according to machine learning algorithm, the feature of sample web page each in webpage sample set is learnt, its angle towards optimization loss function is developed, and then obtains an original web page scoring model.
Because webpage training sample set has certain limitation, cause some feature learnings of sample web page in its learning process abundant not possibly, thus cause created original web page scoring model not too reasonable, such as some web page characteristics role in original web page scoring model is excessively strong, and other web page characteristics are role Shortcomings in original web page scoring model.
For this reason, the web page characteristics (webpage feature to be slackened) of strong web page characteristics (webpage feature to be strengthened) and effect deficiency can will be acted in original web page scoring model, as at least one webpage feature to be adjusted, then regenerate a target loss function accordingly, utilize this target loss function to use above-mentioned machine learning algorithm to create new webpage scoring model.Can the effect of feature to be adjusted for webpage be put in target loss function like this, adjust its effect in webpage scoring model by affecting target loss function, thus above-mentioned drawback can be overcome.Wherein, webpage feature to be adjusted can be the arbitrary characteristics in multiple web page characteristics of above-mentioned setting, may also be other web page characteristics be not comprised in sample web page proper vector.
After determining complete multiple webpage feature to be adjusted, the marking of the mark of this webpage of various kinds and webpage feature to be adjusted is concentrated to generate target loss function according to webpage training sample, can specifically comprise: the acting factor being used for characterizing webpage feature to be adjusted is added in the decision factor of primary loss function, upgrade primary loss function, obtain the target loss function being added with acting factor.Wherein, decision factor is the factor concentrating difference degree between different web pages under same queries word in primary loss function for weighing webpage training sample.If webpage is to be adjusted be characterized as webpage feature to be strengthened, then can directly its acting factor be made an addition in decision factor, to strengthen this webpage feature to be strengthened role in webpage scoring model in the mode be added; If webpage is to be adjusted be characterized as webpage feature to be slackened, then can directly in the mode of subtracting each other, its acting factor be made an addition in decision factor, to slacken this webpage feature to be slackened role in webpage scoring model.
Certainly, can also come by other means to concentrate the mark of this webpage of various kinds to give a mark according to webpage training sample and webpage feature generation to be adjusted target loss function, as long as the target loss function generated can strengthen webpage feature to be strengthened accordingly and slacken webpage feature to be slackened role in webpage scoring model.
It should be noted that, the present embodiment also can generate original web page scoring model in advance, and directly obtain the multiple webpages feature to be adjusted preset, then according to multiple webpages feature to be adjusted that webpage training sample concentrates the mark of this webpage of various kinds to give a mark and obtain, target loss function is generated; And then the proper vector of this webpage of various kinds is concentrated according to target loss function and webpage training sample, create webpage scoring model.Wherein target loss function should comprise: the factor concentrating difference degree between different web pages under same queries word for weighing webpage training sample, and for characterizing the acting factor of webpage feature to be adjusted.
The technical scheme that the present embodiment provides, multiple webpage feature to be adjusted can be pre-determined, then concentrate the mark of this webpage of various kinds to give a mark in conjunction with determined multiple webpage characteristic sum webpage to be adjusted training sample simultaneously, generate target loss function, and then concentrate the proper vector of this webpage of various kinds to create webpage scoring model according to this target loss function and webpage training sample, thus the effect degree of webpage feature to be adjusted in webpage scoring model can be improved, can overcome traditional webpage scoring model there will be the learning process of some features of webpage reasonable not, and then cause the effect of these features in webpage scoring model not enough or act on excessive drawback.Therefore, utilize the webpage scoring model created in the present embodiment, the multiple webpages under arbitrary query word of input are given a mark, then carries out webpage sorting according to this marking result, the accuracy rate of webpage sorting result can be improved, promote the search experience of user.
Embodiment two
Fig. 2 is the schematic flow sheet of the creation method of a kind of webpage scoring model that the embodiment of the present invention two provides.The present embodiment, on the basis of above-described embodiment one, is optimized operation 120.See Fig. 2, the method that the present embodiment provides specifically comprises following operation:
Operation 210, obtain webpage training sample set, wherein webpage training sample set comprises and giving a mark with the proper vector of the multiple sample web page under each query word at least one query word preset and marking.
Operation 220, obtain and concentrate the mark of this webpage of various kinds to give a mark the primary loss function obtained according to webpage training sample.
In the present embodiment, the mark marking of this webpage of various kinds is concentrated to generate primary loss function according to webpage training sample in advance, then according to machine learning algorithm, the feature of sample web page each in webpage sample set is learnt, its angle towards optimization loss function is developed, and then obtains an original web page scoring model.Wherein, primary loss function can be any for creating webpage scoring model time the loss function that adopts, as long as comprise the decision factor concentrating difference degree between different web pages under same queries word for weighing webpage training sample in this loss function.Such as, primary loss function is cross entropy loss function, or Hinge loss loss function.
Operation 230, the decision factor concentrating difference degree between different web pages under same queries word for weighing webpage training sample determined in primary loss function.
Operate 240, respectively for each webpage feature to be adjusted at least one webpage predetermined feature to be adjusted, the acting factor being used for characterizing this webpage feature to be adjusted added in decision factor, to generate target loss function.
In the present embodiment, by artificial or automatic mode, original web page scoring model is analyzed, determined the web page characteristics need strengthening or slacken its role in original web page scoring model, as webpage feature to be adjusted.Wherein, webpage feature to be adjusted can be the arbitrary characteristics in multiple web page characteristics of above-mentioned setting, may also be other web page characteristics be not comprised in sample web page proper vector.Certainly, also original web page scoring model can be utilized to give a mark to each webpage under some query words of input, then by the risk existing for marking interpretation of result original web page scoring model, determine the web page characteristics need strengthening or slacken its role in original web page scoring model, as webpage feature to be adjusted.
If webpage is to be adjusted be characterized as webpage feature to be strengthened, can directly the acting factor being used for characterizing this feature be made an addition in the decision factor of primary loss function, to strengthen this feature role in webpage scoring model in the mode be added; If webpage is to be adjusted be characterized as webpage feature to be slackened, then can directly in the mode of subtracting each other, the acting factor being used for characterizing this feature be made an addition in the decision factor of primary loss function, to slacken this feature role in webpage scoring model.
The present embodiment one preferred embodiment in, the acting factor being used for characterizing this webpage feature to be adjusted is added in decision factor, comprise: after the acting factor being used for characterizing this webpage feature to be adjusted being multiplied by corresponding function coefficient, add in the described decision factor of primary loss function.
Wherein, if the webpage of acting factor sign is to be adjusted be characterized as webpage feature to be strengthened, then corresponding function coefficient is positive coefficient; If the webpage that acting factor the characterizes webpage that is characterized as to be adjusted is waited to weaken feature, then corresponding function coefficient is negative coefficient.In the present embodiment, function coefficient can be redefined for fixed value based on experience value; Also first can set an initial value at random, then carry out continuous optimization function coefficient by the mode of iteration, its specific implementation process can see the technical scheme described in the embodiment of the present invention three.
Above-mentioned embodiment can by arranging function coefficient, control webpage feature to be adjusted role in webpage scoring model of its correspondence flexibly, compared in the mode be directly added or subtract each other, more effectively can overcome web page characteristics effect and cross strong or that effect is not enough drawback.
For example, the decision factor of difference degree between different web pages under same queries word is concentrated to be for weighing webpage training sample in primary loss function: H (x qi, x qj)=-h (x qi)+h (x qj);
Decision factor after interpolation operation can be:
H new ( x qi , x qj ) = - h ( x qi ) + h ( x qj ) + Σ t ϵ t * ( reduce _ diff t * label _ diff )
Wherein, q is the integer being more than or equal to 1 and being less than or equal to Q, and Q is all query word numbers at least one query word;
H (x qi) and h (x qj) are all predicted values of webpage scoring model, x qiand x qjbe respectively the different web pages under q query word at least one query word;
T is the integer being more than or equal to 1 and being less than or equal to the total number of at least one webpage feature to be adjusted;
Reduce_diff t* label_diff is for characterizing the acting factor of t webpage feature to be adjusted at least one webpage feature to be adjusted, ε tfor the function coefficient corresponding with the acting factor for characterizing t webpage feature to be adjusted at least one webpage feature to be adjusted;
Reduce_diff t=reduce t, qi-reduce t, qj, be x qit webpage eigenwert reduce to be adjusted t, qiwith x qjt webpage eigenwert reduce to be adjusted t, qjbetween difference;
Label_diff=label qi-label qjx qimark marking label qiwith x qjmark marking label qjdifference.
Such as, at least one webpage feature to be adjusted is a webpage and omits feature (webpage feature to be strengthened), and the function coefficient that the acting factor for characterizing this webpage omission feature is corresponding is 1, then the decision factor after interpolation operation is:
H new(x qi,x qj)=-h(x qi)+h(x qj)+reduce_diff*label_diff
Wherein, reduce_diff is x qiwebpage omit eigenwert and x qjwebpage omit difference between eigenwert.
Concrete, if primary loss function is cross entropy loss function L (h), the mathematic(al) representation of its correspondence is:
L ( h ) = Σ q = 1 Q Σ x qi , x qj [ - p ‾ qi , qj log p qi , qj - ( 1 - p ‾ qi , qj ) log ( 1 - p qi , qj ) ]
p qi , qj = p ( x qi > x qj ) = 1 1 + exp ( H ( x qi , x qj ) + τ * label _ diff )
The target loss function L then generated new(h) be:
L new ( h ) = Σ q = 1 Q Σ x qi , x qj [ - p ‾ qi , qj log p qi , qj new - ( 1 - p ‾ qi , qj ) log ( 1 - p qi , qj new ) ]
p qi , jq new = p new ( x qi > x qj ) = 1 1 + exp ( H new ( x qi , x qj ) + τ * label _ diff )
Wherein, p (x qi> x qj) be x in primary loss function qicompare x qjmark divide high probable value;
τ is default parameter value; p new(x qi> x qj) be x in target loss function qicompare x qjmark divide high probable value;
S qi , qj = 1 , if x qi > x qj - 1 , if x qi < x qj 0 , otherwise
p &OverBar; qi , qj = ( 1 + S qi , qj ) 2
If primary loss function is Hinge loss loss function R (h, τ), the mathematic(al) representation of its correspondence is:
R ( h , &tau; ) = 1 2 &Sigma; q = 1 Q &Sigma; qi , qj max { 0 , &tau; * label _ diff + H ( x qi , x qj ) } ) 2 - &lambda; &tau; 2
Then target loss function R new(h, τ) is:
R new ( h , &tau; ) = 1 2 &Sigma; q = 1 Q &Sigma; qi , qj max { 0 , &tau; * label _ diff + H new ( x qi , x qj ) } ) 2 - &lambda; &tau; 2
Wherein, τ is the predicted value of webpage scoring model, and λ is default parameter.
Operate 250, concentrate the proper vector of this webpage of various kinds according to generated target loss function and webpage training sample, create webpage scoring model.
In the present embodiment, after generation target loss function, the machine learning algorithm identical with generating original web page scoring model should be adopted, concentrate the proper vector of this webpage of various kinds according to generated target loss function and webpage training sample, create the webpage scoring model that is different from original web page scoring model.
The technical scheme that the present embodiment provides, original object loss function and corresponding original web page scoring model can be generated in advance, and by acting on not enough in the analysis means determination original web page scoring model of setting or acting on excessive web page characteristics, as webpage feature to be adjusted, then the acting factor being used for characterizing webpage feature to be adjusted is added in the decision factor of primary loss function, target loss function is obtained to be adjusted primary loss function, and then the proper vector of this webpage of various kinds is concentrated according to this target loss function and webpage training sample, based on the training algorithm identical with original web page scoring model, re-create a webpage scoring model, thus the effect degree of webpage feature to be adjusted in webpage scoring model can be improved, can overcome traditional webpage scoring model there will be the learning process of some features of webpage reasonable not, and then cause the effect of these features in webpage scoring model not enough or act on excessive drawback.Therefore, based on the webpage scoring model created in the present embodiment, the multiple webpages under arbitrary query word of input are given a mark and sorted, the accuracy rate of webpage sorting result can be improved, promote the search experience of user.
Embodiment three
Fig. 3 is the schematic flow sheet of the creation method of a kind of webpage scoring model that the embodiment of the present invention three provides.The present embodiment, on the basis of above-described embodiment one and embodiment two, further increases the operation upgrading function coefficient in target loss function and create new webpage scoring model.See Fig. 3, the method that the present embodiment provides specifically comprises following operation:
Operation 310, obtain webpage training sample set, wherein webpage training sample set comprises and giving a mark with the proper vector of the multiple sample web page under each query word at least one query word preset and marking.
Operation 320, obtain and concentrate the mark of this webpage of various kinds to give a mark the primary loss function obtained according to webpage training sample.
Operation 330, the decision factor concentrating difference degree between different web pages under same queries word for weighing webpage training sample determined in primary loss function.
Operate 340, respectively for each webpage feature to be adjusted at least one webpage predetermined feature to be adjusted, after the acting factor being used for characterizing this webpage feature to be adjusted being multiplied by corresponding function coefficient, add in decision factor, to generate target loss function.
In the present embodiment, if the webpage of acting factor sign is to be adjusted be characterized as webpage feature to be strengthened, then corresponding function coefficient is positive coefficient, can be set greater than the value of 0 at random; If the webpage that acting factor the characterizes webpage that is characterized as to be adjusted is waited to weaken feature, then corresponding function coefficient is negative coefficient, can be set smaller than the value of 0 at random.
Operate 350, concentrate the proper vector of this webpage of various kinds according to generated target loss function and webpage training sample, create webpage scoring model.
Operate 360, according to the function coefficient in setting Policy Updates target loss function.
In the present embodiment, after executing operation 350 and namely obtaining webpage scoring model, by artificial or automatic mode, this webpage scoring model is analyzed, utilize this webpage scoring model to give a mark to each webpage under some query words of input, then determine the active state of each webpage feature to be adjusted in this webpage scoring model at least one webpage feature to be adjusted according to analysis result and/or marking result.
And then, obtain determination result, according to the function coefficient in setting Policy Updates target loss function.Wherein, described setting rule comprises: if get arbitrary webpage feature to be adjusted at least one webpage feature to be adjusted to be in the not enough state of effect, then increase the function coefficient that target elements for characterizing arbitrary webpage feature to be adjusted is corresponding; If get arbitrary webpage feature to be adjusted at least one webpage feature to be adjusted to be in the larger state of effect, then reduce the function coefficient that target elements for characterizing arbitrary webpage feature to be adjusted is corresponding.
Operate 370, concentrate the proper vector of this webpage of various kinds according to the target loss function after renewal rewards theory and webpage training sample, create new webpage scoring model.
In the present embodiment, after namely the operation 370 that is finished creates and has new webpage scoring model, again can obtain the active state of each webpage feature to be adjusted in described new webpage scoring model at least one webpage feature to be adjusted, then based on the result again obtained, continue to upgrade the function coefficient in target loss function according to setting rule, and then continue the proper vector concentrating this webpage of various kinds according to the target loss function after renewal rewards theory and webpage training sample, the webpage scoring model that establishment one is new again.Loop iteration like this goes down.
Concrete, if again get certain webpage feature to be strengthened at least one webpage feature to be adjusted be still in the not enough state of effect, then can continue to increase the function coefficient that target elements for characterizing this webpage feature to be strengthened is corresponding; Otherwise, if again get certain webpage feature to be strengthened at least one webpage feature to be adjusted be in the larger state of effect, then can reduce the function coefficient that target elements for characterizing this webpage feature to be strengthened is corresponding.
Accordingly, if again get certain webpage feature to be slackened at least one webpage feature to be adjusted be in the not enough state of effect, then the function coefficient that target elements for characterizing this webpage feature to be strengthened is corresponding can be increased; If again get certain webpage feature to be slackened at least one webpage feature to be adjusted be still in the larger state of effect, then can continue to reduce for function coefficient corresponding to the target elements that characterizes this webpage feature to be strengthened.
Wherein, the step-length that each iteration increased or reduced can be redefined for fixed value, also can along with real-time being adjusted of the increase of iterations, and such as iterations is larger, and the step-length increasing or reduce is less, and the present embodiment is not construed as limiting this.
The technical scheme that the present embodiment provides, preset the function coefficient that acting factor for characterizing webpage feature to be adjusted is corresponding, then constantly optimize this function coefficient by the mode of iteration, thus the active state of webpage feature to be adjusted in the webpage scoring model finally created can be made to be tending towards the perfect condition set.Utilize this webpage scoring model to carry out the marking of the multiple webpages under arbitary inquiry word, greatly can improve the accuracy rate of marking, and then make search engine more reasonable to the sequence of multiple webpage, promote user search and experience.
On the basis of technique scheme, in a kind of embodiment of the embodiment of the present invention, original web page scoring model is GBRank model, the proper vector of this webpage of various kinds is concentrated according to generated target loss function and webpage training sample, create webpage scoring model, can specifically comprise:
Concentrate the proper vector of this webpage of various kinds according to the target loss function generated and webpage training sample, promote sort algorithm based on gradient and create at least one decision tree; At least one the decision tree created is utilized to create webpage scoring model.
Embodiment four
The schematic flow sheet of the creation method of a kind of webpage scoring model that Fig. 4 provides for the embodiment of the present invention four.The present embodiment based on above-described embodiment, can provide a kind of preferred embodiment.See Fig. 4, the method that the present embodiment provides specifically comprises following operation:
Operation 400, obtain webpage training sample set, wherein webpage training sample set comprises and giving a mark with the proper vector of the multiple sample web page under each query word at least one query word preset and marking.
Operate 410, concentrate according to webpage training sample the marking of the mark of this webpage of various kinds to obtain primary loss function, GBRank algorithm is used to concentrate the proper vector of this webpage of various kinds to train to webpage training sample, to create original GBRank model based on this primary loss function.
Wherein, primary loss function is cross entropy loss function.
Cross entropy loss function is to h (x qi) gradient be
&PartialD; L &PartialD; h ( x qi ) = &Sigma; x qj ( 1 2 ( 1 - S qi , qj ) - 1 1 + exp ( h ( x qi ) - h ( x qj ) - &tau; * label _ diff ) ) .
And have &PartialD; L &PartialD; h ( x qi ) = - &PartialD; L &PartialD; h ( x qj ) Set up.
Work as S qi, qjwhen=1, above-mentioned gradient can be expressed as:
&PartialD; L &PartialD; h ( x qi ) = &Sigma; x qj - 1 1 + exp ( h ( x qi ) - h ( x qj ) - &tau; * label _ diff ) .
In the present embodiment, use GBRank algorithm to concentrate the proper vector of this webpage of various kinds to train to webpage training sample, to create GBRank model, comprising:
1, each predicted value { h in initialization webpage scoring model 0(x qw), q is the integer being more than or equal to 1 and being less than or equal to Q, and Q is all query word numbers at least one query word; x qwfor w webpage under q query word at least one query word;
2, establish k successively from 1 value to K, perform following steps respectively:
Calculate negative gradient r k , qw = - &PartialD; L &PartialD; h k - 1 ( x qw ) ;
According to { r k, qwcreate a decision tree g k;
Upgrade h k(x qw)=h k-1(x qw)+η g k(x qw).
Operate 420, according to created original GBRank model, determine at least one webpage feature to be adjusted.
Operate 430, according at least one webpage determined feature to be adjusted, judge whether amendment primary loss function.
If determine there is not any webpage feature to be adjusted under created original GBRank model, then judge without the need to revising primary loss function, otherwise judgement need revise primary loss function.
If so, then executable operations 450, otherwise executable operations 440.
Operate 440, directly original GBRank model exported as final webpage scoring model, terminate.
Operate 450, determine whether first time iteration;
Iterations initial value is pre-arranged in the present embodiment is 0.If obtaining iterations initial value is 0, is then judged as first time iteration, otherwise is non-first time iteration.
If so, then executable operations 460, otherwise executable operations 470.
The target loss function that operation 460, generation include for the acting factor and respective action coefficient thereof characterizing webpage feature to be adjusted; Wherein, if the webpage of acting factor sign is to be adjusted be characterized as webpage feature to be strengthened, then corresponding function coefficient is the positive coefficient of random setting; If the webpage that acting factor the characterizes webpage that is characterized as to be adjusted is waited to weaken feature, then corresponding function coefficient is the negative coefficient of random setting.Executable operations 490.
Operate 460 in the present embodiment specifically to comprise:
Obtaining primary loss function, determining the decision factor concentrating difference degree between different web pages under same queries word for weighing webpage training sample in primary loss function;
Respectively for each webpage feature to be adjusted at least one webpage feature to be adjusted, after the acting factor being used for characterizing this webpage feature to be adjusted being multiplied by corresponding function coefficient, add in decision factor, to generate target loss function.
To the concrete detailed description of operation 460 can see in embodiment two to the description of operation 220-240, do not repeat them here.
Operate 470, obtain the active state of at least one webpage feature to be adjusted in current created webpage scoring model;
If operation 480 arbitrary webpage characteristic action state to be adjusted is the not enough state of effect or the larger state of effect, then according to the function coefficient in setting Policy Updates target loss function.
Wherein, set rule to comprise: increase at least one webpage feature to be adjusted for characterizing function coefficient corresponding to the target elements that is in the webpage feature to be adjusted acting on not enough state; Reduce at least one webpage feature to be adjusted for function coefficient corresponding to the target elements of the webpage feature to be adjusted characterizing the larger state of the effect of being in.
Operate 490, concentrate the proper vector of this webpage of various kinds according to this target loss function generated and webpage training sample, create webpage scoring model, and iterations is added 1.Return executable operations 450.
Utilize final the created webpage scoring model of the present embodiment, carry out the marking of the multiple webpages under arbitary inquiry word, the phenomenon of the webpage characteristic action deficiency to be adjusted such as omitting feature and so on can be made to improve, thus greatly improve the accuracy rate of marking, make search engine more reasonable to the sequence of multiple webpage, promote user search and experience.
Embodiment five
Figure 5 shows that the structural representation of the creation apparatus of a kind of webpage scoring model that the embodiment of the present invention five provides.See Fig. 5, this device can by software and/hardware implementing, and concrete structure is as follows:
Webpage training sample acquiring unit 510, for obtaining webpage training sample set, wherein said webpage training sample set comprises gives a mark with marking with the proper vector of the multiple sample web page under each query word at least one query word preset;
Target loss function generation unit 520, for concentrating the mark of this webpage of various kinds to give a mark and at least one webpage predetermined feature to be adjusted according to described webpage training sample, generates target loss function;
Webpage scoring model creating unit 530, for concentrating the proper vector of this webpage of various kinds according to generated target loss function and described webpage training sample, creates webpage scoring model.
Further, described target loss function generation unit 520, comprising:
Primary loss function obtains subelement 5201, concentrates the mark of this webpage of various kinds to give a mark the primary loss function obtained for obtaining according to described webpage training sample;
Decision factor determination subelement 5202, for determining the decision factor concentrating difference degree between different web pages under same queries word for weighing described webpage training sample in described primary loss function;
Acting factor adds subelement 5203, for respectively for each webpage feature to be adjusted at least one webpage predetermined feature to be adjusted, the acting factor being used for characterizing this webpage feature to be adjusted is added in described decision factor, to generate target loss function.
Further, described acting factor adds subelement 5203, specifically for:
Respectively for each webpage feature to be adjusted at least one webpage predetermined feature to be adjusted, after the acting factor being used for characterizing this webpage feature to be adjusted being multiplied by corresponding function coefficient, add in described decision factor;
Wherein, if the webpage of acting factor sign is to be adjusted be characterized as webpage feature to be strengthened, then corresponding function coefficient is positive coefficient; If the webpage that acting factor the characterizes webpage that is characterized as to be adjusted is waited to weaken feature, then corresponding function coefficient is negative coefficient.
Further, this device also comprises:
Function coefficient updating block 540, for concentrating the proper vector of this webpage of various kinds according to generated target loss function and described webpage training sample in described webpage scoring model creating unit 530, after creating webpage scoring model, the function coefficient in target loss function according to setting Policy Updates;
Wherein, described setting rule comprises: if get arbitrary webpage feature to be adjusted in described at least one webpage feature to be adjusted to be in the not enough state of effect, then increase the function coefficient that target elements for characterizing described arbitrary webpage feature to be adjusted is corresponding; If get arbitrary webpage feature to be adjusted in described at least one webpage feature to be adjusted to be in the larger state of effect, then reduce the function coefficient that target elements for characterizing described arbitrary webpage feature to be adjusted is corresponding;
New web page scoring model creating unit 550, for concentrating the proper vector of this webpage of various kinds according to the target loss function after renewal rewards theory and described webpage training sample, creates new webpage scoring model.
Further, described decision factor H (x qi, x qj)=-h (x qi)+h (x qj);
Described decision factor after interpolation operation
H new ( x qi , x qj ) = - h ( x qi ) + h ( x qj ) + &Sigma; t &epsiv; t * ( reduce _ diff t * label _ diff )
Wherein, q is the integer being more than or equal to 1 and being less than or equal to Q, and Q is all query word numbers at least one query word described;
H (x qi) and h (x qj) are all predicted values of described webpage scoring model, x qiand x qjbe respectively the different web pages under q query word at least one query word described;
T is the integer being more than or equal to 1 and being less than or equal to the total number of at least one webpage feature to be adjusted;
Reduce_diff t* label_diff is for characterizing the acting factor of t webpage feature to be adjusted in described at least one webpage feature to be adjusted, ε tfor the function coefficient corresponding with the acting factor for characterizing t webpage feature to be adjusted in described at least one webpage feature to be adjusted;
Reduce_diff t=reduce t, qi-reduce t, qj, be x qit webpage eigenwert reduce to be adjusted t, qiwith x qjt webpage eigenwert reduce to be adjusted t, qjbetween difference;
Label_diff=label qi-label qjx qimark marking label qiwith x qjmark marking label qjdifference.
On the basis of technique scheme, described primary loss function is cross entropy loss function, or Hinge loss loss function.
On the basis of technique scheme, described webpage scoring model creating unit 530, specifically for: the proper vector concentrating this webpage of various kinds according to the target loss function generated and webpage training sample, promotes sort algorithm based on gradient and creates at least one decision tree; At least one the decision tree created is utilized to create webpage scoring model.
The said goods can perform the method that any embodiment of the present invention provides, and possesses the corresponding functional module of manner of execution and beneficial effect.
Note, above are only preferred embodiment of the present invention and institute's application technology principle.Skilled person in the art will appreciate that and the invention is not restricted to specific embodiment described here, various obvious change can be carried out for a person skilled in the art, readjust and substitute and can not protection scope of the present invention be departed from.Therefore, although be described in further detail invention has been by above embodiment, the present invention is not limited only to above embodiment, when not departing from the present invention's design, can also comprise other Equivalent embodiments more, and scope of the present invention is determined by appended right.

Claims (14)

1. a creation method for webpage scoring model, is characterized in that, comprising:
Obtain webpage training sample set, wherein said webpage training sample set comprises gives a mark with marking with the proper vector of the multiple sample web page under each query word at least one query word preset;
Concentrate the mark of this webpage of various kinds to give a mark and at least one webpage predetermined feature to be adjusted according to described webpage training sample, generate target loss function;
Concentrate the proper vector of this webpage of various kinds according to generated target loss function and described webpage training sample, create webpage scoring model.
2. the creation method of webpage scoring model according to claim 1, is characterized in that, concentrates the mark of this webpage of various kinds to give a mark and at least one webpage predetermined feature to be adjusted, generate target loss function, comprising according to described webpage training sample:
Obtain and concentrate the mark of this webpage of various kinds to give a mark the primary loss function obtained according to described webpage training sample;
Determine the decision factor concentrating difference degree between different web pages under same queries word for weighing described webpage training sample in described primary loss function;
Respectively for each webpage feature to be adjusted at least one webpage predetermined feature to be adjusted, the acting factor being used for characterizing this webpage feature to be adjusted is added in described decision factor, to generate target loss function.
3. the creation method of webpage scoring model according to claim 2, is characterized in that, is describedly added in described decision factor by the acting factor being used for characterizing this webpage feature to be adjusted, comprising:
After the acting factor being used for characterizing this webpage feature to be adjusted being multiplied by corresponding function coefficient, add in described decision factor;
Wherein, if the webpage of acting factor sign is to be adjusted be characterized as webpage feature to be strengthened, then corresponding function coefficient is positive coefficient; If the webpage that acting factor the characterizes webpage that is characterized as to be adjusted is waited to weaken feature, then corresponding function coefficient is negative coefficient.
4. the creation method of webpage scoring model according to claim 3, is characterized in that, is concentrating the proper vector of this webpage of various kinds according to generated target loss function and described webpage training sample, after creating webpage scoring model, also comprises:
Function coefficient in target loss function according to setting Policy Updates;
Concentrate the proper vector of this webpage of various kinds according to the target loss function after renewal rewards theory and described webpage training sample, create new webpage scoring model.
5. the creation method of webpage scoring model according to claim 3, is characterized in that, described decision factor H (x qi, x qj)=-h (x qi)+h (x qj);
Described decision factor after interpolation operation
H new ( x qi , x qj ) = - h ( x qi ) + h ( x qj ) + &Sigma; t &epsiv; t * ( reduce _ diff t * label _ diff )
Wherein, q is the integer being more than or equal to 1 and being less than or equal to Q, and Q is all query word numbers at least one query word described;
H (x qi) and h (x qj) are all predicted values of described webpage scoring model, x qiand x qjbe respectively the different web pages under q query word at least one query word described;
T is the integer being more than or equal to 1 and being less than or equal to the total number of at least one webpage feature to be adjusted;
Reduce_diff t* label_diff is for characterizing the acting factor of t webpage feature to be adjusted in described at least one webpage feature to be adjusted, ε tfor the function coefficient corresponding with the acting factor for characterizing t webpage feature to be adjusted in described at least one webpage feature to be adjusted;
Reduce_diff t=reduce t, qi-reduce t, qj, be x qit webpage eigenwert reduce to be adjusted t, qiwith x qjt webpage eigenwert reduce to be adjusted t, qjbetween difference;
Label_diff=label qi-label qjx qimark marking label qiwith x qjmark marking label qjdifference.
6., according to the creation method of described webpage scoring model arbitrary in claim 2-5, it is characterized in that, described primary loss function is cross entropy loss function, or Hinge loss loss function.
7., according to the creation method of described webpage scoring model arbitrary in claim 1-5, it is characterized in that,
Concentrate the proper vector of this webpage of various kinds according to generated target loss function and described webpage training sample, create webpage scoring model, comprising:
Concentrate the proper vector of this webpage of various kinds according to the target loss function generated and webpage training sample, promote sort algorithm based on gradient and create at least one decision tree; At least one the decision tree created is utilized to create webpage scoring model.
8. a creation apparatus for webpage scoring model, is characterized in that, comprising:
Webpage training sample acquiring unit, for obtaining webpage training sample set, wherein said webpage training sample set comprises gives a mark with marking with the proper vector of the multiple sample web page under each query word at least one query word preset;
Target loss function generation unit, for concentrating the mark of this webpage of various kinds to give a mark and at least one webpage predetermined feature to be adjusted according to described webpage training sample, generates target loss function;
Webpage scoring model creating unit, for concentrating the proper vector of this webpage of various kinds according to generated target loss function and described webpage training sample, creates webpage scoring model.
9. the creation apparatus of webpage scoring model according to claim 8, is characterized in that, described target loss function generation unit, comprising:
Primary loss function obtains subelement, concentrates the mark of this webpage of various kinds to give a mark the primary loss function obtained for obtaining according to described webpage training sample;
Decision factor determination subelement, for determining the decision factor concentrating difference degree between different web pages under same queries word for weighing described webpage training sample in described primary loss function;
Acting factor adds subelement, for respectively for each webpage feature to be adjusted at least one webpage predetermined feature to be adjusted, adds in described decision factor, to generate target loss function by the acting factor being used for characterizing this webpage feature to be adjusted.
10. the creation apparatus of webpage scoring model according to claim 9, is characterized in that, described acting factor adds subelement, specifically for:
Respectively for each webpage feature to be adjusted at least one webpage predetermined feature to be adjusted, after the acting factor being used for characterizing this webpage feature to be adjusted being multiplied by corresponding function coefficient, add in described decision factor;
Wherein, if the webpage of acting factor sign is to be adjusted be characterized as webpage feature to be strengthened, then corresponding function coefficient is positive coefficient; If the webpage that acting factor the characterizes webpage that is characterized as to be adjusted is waited to weaken feature, then corresponding function coefficient is negative coefficient.
The creation apparatus of 11. webpage scoring model according to claim 10, is characterized in that, also comprise:
Function coefficient updating block, for concentrating the proper vector of this webpage of various kinds according to generated target loss function and described webpage training sample in described webpage scoring model creating unit, after creating webpage scoring model, the function coefficient in target loss function according to setting Policy Updates;
New web page scoring model creating unit, for concentrating the proper vector of this webpage of various kinds according to the target loss function after renewal rewards theory and described webpage training sample, creates new webpage scoring model.
The creation apparatus of 12. webpage scoring model according to claim 10, is characterized in that, described decision factor H (x qi, x qj)=-h (x qi)+h (x qj);
Described decision factor after interpolation operation
H new ( x qi , x qj ) = - h ( x qi ) + h ( x qj ) + &Sigma; t &epsiv; t * ( reduce _ diff t * label _ diff )
Wherein, q is the integer being more than or equal to 1 and being less than or equal to Q, and Q is all query word numbers at least one query word described;
H (x qi) and h (x qj) are all predicted values of described webpage scoring model, x qiand x qjbe respectively the different web pages under q query word at least one query word described;
T is the integer being more than or equal to 1 and being less than or equal to the total number of at least one webpage feature to be adjusted;
Reduce_diff t* label_diff is for characterizing the acting factor of t webpage feature to be adjusted in described at least one webpage feature to be adjusted, ε tfor the function coefficient corresponding with the acting factor for characterizing t webpage feature to be adjusted in described at least one webpage feature to be adjusted;
Reduce_diff t=reduce t, qi-reduce t, qj, be x qit webpage eigenwert reduce to be adjusted t, qiwith x qjt webpage eigenwert reduce to be adjusted t, qjbetween difference;
Label_diff=label qi-label qjx qimark marking label qiwith x qjmark marking label qjdifference.
13. according to the creation apparatus of described webpage scoring model arbitrary in claim 9-12, and it is characterized in that, described primary loss function is cross entropy loss function, or Hinge loss loss function.
14. creation apparatus of arbitrary described webpage scoring model in-12 according to Claim 8, it is characterized in that, described webpage scoring model creating unit, specifically for: the proper vector concentrating this webpage of various kinds according to the target loss function generated and webpage training sample, promotes sort algorithm based on gradient and creates at least one decision tree; At least one the decision tree created is utilized to create webpage scoring model.
CN201410638360.4A 2014-11-06 2014-11-06 The creation method and device of webpage scoring model Active CN104361077B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410638360.4A CN104361077B (en) 2014-11-06 2014-11-06 The creation method and device of webpage scoring model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410638360.4A CN104361077B (en) 2014-11-06 2014-11-06 The creation method and device of webpage scoring model

Publications (2)

Publication Number Publication Date
CN104361077A true CN104361077A (en) 2015-02-18
CN104361077B CN104361077B (en) 2017-11-03

Family

ID=52528338

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410638360.4A Active CN104361077B (en) 2014-11-06 2014-11-06 The creation method and device of webpage scoring model

Country Status (1)

Country Link
CN (1) CN104361077B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104715063A (en) * 2015-03-31 2015-06-17 百度在线网络技术(北京)有限公司 Search ranking method and search ranking device
CN107622056A (en) * 2016-07-13 2018-01-23 百度在线网络技术(北京)有限公司 The generation method and device of training sample

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070255689A1 (en) * 2006-04-28 2007-11-01 Gordon Sun System and method for indexing web content using click-through features
CN102043776A (en) * 2009-10-14 2011-05-04 南开大学 Inquiry-related multi-ranking-model integration algorithm
CN103984733A (en) * 2014-05-20 2014-08-13 国家电网公司 Direct optimizing performance index sequencing method capable of embodying query difference

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070255689A1 (en) * 2006-04-28 2007-11-01 Gordon Sun System and method for indexing web content using click-through features
CN102043776A (en) * 2009-10-14 2011-05-04 南开大学 Inquiry-related multi-ranking-model integration algorithm
CN103984733A (en) * 2014-05-20 2014-08-13 国家电网公司 Direct optimizing performance index sequencing method capable of embodying query difference

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104715063A (en) * 2015-03-31 2015-06-17 百度在线网络技术(北京)有限公司 Search ranking method and search ranking device
CN104715063B (en) * 2015-03-31 2018-11-02 百度在线网络技术(北京)有限公司 search ordering method and device
CN107622056A (en) * 2016-07-13 2018-01-23 百度在线网络技术(北京)有限公司 The generation method and device of training sample
CN107622056B (en) * 2016-07-13 2021-03-02 百度在线网络技术(北京)有限公司 Training sample generation method and device

Also Published As

Publication number Publication date
CN104361077B (en) 2017-11-03

Similar Documents

Publication Publication Date Title
Qin et al. Global ranking using continuous conditional random fields
CN108182175B (en) Text quality index obtaining method and device
CN104573046A (en) Comment analyzing method and system based on term vector
CN102646095B (en) Object classifying method and system based on webpage classification information
CN105224959A (en) The training method of order models and device
CN106228183A (en) A kind of semi-supervised learning sorting technique and device
CN102073730A (en) Method for constructing topic web crawler system
CN102508859A (en) Advertisement classification method and device based on webpage characteristic
CN110399487A (en) A kind of file classification method, device, electronic equipment and storage medium
CN107292348A (en) A kind of Bagging_BSJ short text classification methods
CN112084307A (en) Data processing method and device, server and computer readable storage medium
Zaghloul et al. Text classification: neural networks vs support vector machines
CN111881360A (en) Public opinion data processing method, system, equipment and readable storage medium
CN104361077A (en) Creation method and device for web page scoring model
Behbood et al. Text categorization by fuzzy domain adaptation
CN105095271A (en) Microblog retrieval method and microblog retrieval apparatus
CN103605493A (en) Parallel sorting learning method and system based on graphics processing unit
CN103279535A (en) Method for recommending potential partners for patentee
CN109299007A (en) A kind of defect repair person&#39;s auto recommending method
Gao et al. Text categorization based on improved Rocchio algorithm
Herrera et al. Ensemble-based classifiers
Johnson et al. Web content mining using genetic algorithm
CN113722439A (en) Cross-domain emotion classification method and system based on antagonism type alignment network
Spracklen et al. Towards portfolios of streamlined constraint models: a case study with the balanced academic curriculum problem
Obukhovskaya et al. Yandex at TREC 2011 Microblog Track.

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant