CN104794250A

CN104794250A - Project selection method based on self-adaptive active learning

Info

Publication number: CN104794250A
Application number: CN201510255684.4A
Authority: CN
Inventors: 吴健; 李承超; 张宇
Original assignee: SUZHOU RONGXI INFORMATION TECHNOLOGY Co Ltd
Current assignee: Suzhou Feiyu Mutual Entertainment Information Technology Co., Ltd.
Priority date: 2015-05-19
Filing date: 2015-05-19
Publication date: 2015-07-22
Anticipated expiration: 2035-05-19
Also published as: CN104794250B

Abstract

The invention discloses a project selection method based on self-adaptive active learning. The method comprises the steps of calculating the uncertainty of a candidate project; calculating the representativeness of the candidate project; and self-adaptively and dynamically selecting the project with highest information content according to the uncertainty and representativeness. By adopting the project selection method, the uncertainty and representativeness of the project can be comprehensively considered, and the project with the highest information content can be selected.

Description

A kind of project selection method based on self-adaptation Active Learning

Technical field

The present invention relates to commending system technical field, particularly relate to a kind of project selection method based on self-adaptation Active Learning.

Background technology

In Collaborative Filtering Recommendation System, the key solving user's cold start-up problem is the interest preference model setting up new user how fast.When user initially uses system, the method based on Active Learning scoring guiding initiatively selects some projects allows user evaluate the personalization preferences information that effectively can obtain user.Project of selecting is marked for 2 considerations to user: (1) user can obtain the more score data of user to project scoring, and score information is more, and commending system is more effective; And not all score information is all equivalent (2), some scoring can the customized information of representative of consumer, some then can not, therefore different Active Learning scoring bootstrap techniques can bring different effects.Such as select popular project to evaluate to user, although can obtain more user's score data, personalization preferences information system being obtained to user helps little, because most of user likes popular project always.Therefore, how to design an effective Active Learning items selection strategy, the higher project score data of the least possible selection information content can represent that the preference information of user is very crucial problem better, be also current problem demanding prompt solution.

Summary of the invention

The invention provides a kind of project selection method based on self-adaptation Active Learning, uncertainty and the representativeness that can consider project pick out the highest project of information content.

The invention provides a kind of project selection method based on self-adaptation Active Learning, comprising:

The uncertainty of calculated candidate project;

The representativeness of calculated candidate project;

The project that information content is the highest is selected according to described uncertainty and representativeness.

Preferably, the uncertainty of described calculated candidate project is:

According to formula calculate the uncertainty of candidate items, wherein: R _cxrepresent that user c is to the scoring of project x, represent the average score of user, U _x(sim) represent similar with current new user and have the user of scoring behavior to gather to project x.

Preferably, the representativeness of described calculated candidate project comprises:

At training set T _con calculate c according to forecast model θ the prediction of x marked and estimate that c is that x marks as Probability p (U=c, the R of r _cx=r), and r is used as y _cx(θ) changing value, wherein,

Upgrade scoring training set T _c, by prediction scoring changing value add in the aggregate list of scoring item of c, obtain new scoring training set T _c,r=T _c∪ (x, r);

At scoring training set T _cand T _c,ron, according to forecast model θ, c is to non-scoring item set in prediction in other non-scoring item x _iscore value, obtain respectively on corresponding training set prediction scoring be with

At Probability p (U=c, R that scoring is r _cx=r) under, estimate that the scoring of current candidate project x changes the impact of marking on other project forecast, use with square expression scoring change of difference, according to formula

rep (x) = \underset{x_{i} &Element; X_{c}^{(u \ x)}}{Σ} \underset{r &Element; R}{Σ} (p (U = c, R = r) {(y_{{cx}_{i}}^{T_{c}} (θ) - y_{{cx}_{i}}^{T_{c, r}} (θ))}^{2})

Calculate the representative rep (x) of current candidate project x, wherein: c represents current new user, and x represents current candidate project, represent the non-scoring item set of c, represent the set of scoring item of c, represent the non-scoring item set of residue removing c after x, namely

X_{c}^{(u \ x)} = X_{c}^{(u)} \ {x},

in each project x _irepresent,

T_{c} (T_{c} = \cup_{x &Element; X_{c}^{(r)}} (x, R_{cx}))

The training dataset that c is corresponding, R _cxrepresent that c is to the scoring of x.

Preferably, the described project the highest with representative selection information content according to described uncertainty is:

According to formula calculate the project that information content is high, wherein: uncertainty (x) is uncertain, rep (x) is representative, and c represents current new user, and x represents current candidate project, represent the non-scoring item set of c.

Preferably, also comprise after the representativeness of described calculated candidate project:

Specify weights set W in advance, W={w ₁, w ₂..., w _n-1, w _n, its size | W|=n;

Candidate items set I is set for empty,

For current weight w _i, w _i∈ W, L candidate items before selecting, forms project set I _i;

Upgrade candidate items set I=I ∪ I _i;

At user c existing scoring set T _cupper training obtains forecast model θ, calculates c mark to the prediction of project x according to θ upgrade training set T _c;

Calculate prediction effort analysis ε (x) that each project is corresponding;

The project x of most information content is selected from candidate items set I ^*.

Preferably, described for current weight w _i, w _i∈ W, before selecting, L candidate items is:

According to described uncertain uncertainty (x) and representative rep (x), according to formula info (x)=uncertainty (x) ^w× rep (x) ^(1-w)calculate consequent object information content info (x) of combination;

According to formula calculate the project x of most information content ^*, L candidate items before selecting.

Preferably, described at user c existing scoring set T _cupper training obtains forecast model θ, calculates c mark to the prediction of project x according to θ upgrade training set T _cfor:

According to formula upgrade training set T _c.

Preferably, prediction effort analysis ε (x) that each project of described calculating is corresponding is:

According to the T after renewal _ctraining obtains new forecast model based on prediction c is to the training of scoring item set middle term order t (t ∈ T _c) scoring according to formula estimate true scoring and predict the deviation ε (x) marked, wherein: represent the collaborative filtering model after upgrading the c of prediction is to the scoring of project t.

Preferably, the described project x selecting most information content from candidate items set I ^*for:

According to formula select the project x of most information content ^*.

From such scheme, a kind of project selection method based on self-adaptation Active Learning provided by the invention, by the uncertain and representational calculating to candidate items, uncertainty and the representativeness of project are considered, when the project that choose information content is the highest is marked to user, overcome the deficiency based on uncertain items selection strategy, the highest project of information content can be picked out.

Accompanying drawing explanation

In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.

The process flow diagram of Fig. 1 a kind of project selection method based on self-adaptation Active Learning disclosed in the embodiment of the present invention;

The process flow diagram of Fig. 2 a kind of project selection method based on self-adaptation Active Learning disclosed in another embodiment of the present invention;

Fig. 3 is uncertain project schematic diagram;

Fig. 4 is uncertain sampling defect schematic diagram;

Fig. 5 is that representative items selects schematic diagram;

Fig. 6 is project x _iscoring variable effect schematic diagram;

Fig. 7 is project x _jscoring variable effect schematic diagram.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.

As shown in Figure 1, a kind of project selection method based on self-adaptation Active Learning disclosed in the embodiment of the present invention, comprising:

The uncertainty of S101, calculated candidate project;

In Active Learning classification field, the cardinal principle based on uncertain sampling be exactly each from unlabeled data concentrate select data sample time, require that to be equivalent to current learning model be the most uncertain for the sample that do not mark that is selected or is constructed.In Collaborative Filtering Recommendation System, uncertain project just refers to that system can not judge the project of user to its hobby attitude.If according to the scoring history of user and the score information of other users, system Accurate Prediction user can like (or not liking) this project, then illustrate that this project is determined, otherwise be exactly uncertain project.Because user can to represent the preference of user, so most of research score information carrys out the uncertainty of metric terms to the score information of project.The scoring of user to project is more inconsistent, claims the uncertainty of this intermediate item higher.As shown in Figure 3, at 3 users (User1, User2, User3) in the rating matrix to 3 projects (Item1, Item2, Item3), 3 users are least consistent to the score value of project Item1, so the uncertainty of project Item1 is higher than project Item2 and Item3.

The uncertainty of project is higher, and illustrate that the dispute of user to this intermediate item of marking is larger, commending system can not determine the interest of user to be recommended to this intermediate item certainly.Select uncertain high project to user's scoring to be recommended, specific scoring to this intermediate item can be obtained with it from user, thus understand the preference information of user better.For project x, its uncertain uncertainty (x) computing formula is as follows:

uncertainty (x) = variance (x) = \frac{Σ_{c &Element; U_{x}} {(R_{cx} - {\overset{&RightArrow;}{R}}_{x})}^{2}}{| U_{x} |}

Wherein, U _xrepresentative has the user of scoring to gather to project x, | U _x| represent user's number, R _cxrepresent that user c is to the scoring of project x, represent the average score of user.

When calculating current candidate object variance, existing method is all calculate based on the score data of users all in system.For a project, if according to the score calculation of users all in system, its variance is lower, and it is very high to calculate its variance in the similar users of new user to be recommended, for this intermediate item, according to the method for existing system overall situation user metric variance, this project certainly can not be selected to user.From the ultimate principle of collaborative filtering, its similar users is uncertain to this project, and so after score in predicting calculates, this object uncertainty is still very high.In order to address this problem, when calculating variance, only carry out the variance of computational item according to the scoring distributed intelligence of the similar users of user to be recommended.Computational item uncertainty measure method formula after improvement is as follows:

uncertainty (x) = \frac{Σ_{c &Element; U_{x} (sim)} {(R_{cx} - \overset{&RightArrow;}{R_{x}})}^{2}}{| U_{x} (sim) |}

Wherein, U _x(sim) represent similar with current new user and have the user of scoring behavior to gather to project x.

The representativeness of S102, calculated candidate project;

In Active Learning classification problem, uncertainty only embodies candidate and does not mark the impact of sample on current class device, does not consider that it is not marking the information content in sample set in a large number.Under many circumstances, the most uncertain sample may be independent point or noise spot.As shown in Figure 4, triangle and rhombus represent the sample marked in sample set, and residue circle represents the sample do not marked in sample set.Due to x _aseparate class boundary line nearest, its having the greatest impact to sorter, uses the uncertain method of sampling, inevitable choice x _asample transfers to human expert to mark, due to x _abe isolated point, probably cause classification boundaries line to offset, as the classification boundaries line of sorter in figure to be changed to the position of dotted line by original solid line position, may cause like this classifying such other residue sample time make mistakes in a large number.In fact, sample x _bthere is higher information content, because it more can the overall distribution of representative sample, should x be selected _bartificial mark.In order to solve above-mentioned isolated point or noise spot problem, need to consider that current sample is not marking the representativeness in sample set.

In collaborative filtering recommending, indeterminate object is selected also to there is similar problem, because the uncertain project based on uncertain Standard Selection merely reduces the uncertainty of current project, the preference of user to current selected item only can be understood to its scoring by user, the preference of user to sundry item can not be understood, namely uncertain reduction method does not consider current selected item and the relation of a large amount of non-scoring item, can not reduce the uncertainty of other project from the overall situation.Fig. 5 gives a diagram and explains.In figure, solid circles represents scoring item, and empty circles represents non-scoring item, and user has similar scoring behavior to the project in nearer, the same category set of figure middle distance.After a project in generic is marked, can reduce generic in the uncertainty of other project.For the project a and d of two in system, if marked, the scoring of user to project d is more inconsistent, the uncertainty of d is greater than a, i.e. uncertainty (d) >uncertainty (a), uncertain reduction strategy can be marked to user by option d, but obviously project a is to remain the representativeness in non-scoring item higher, select this project can obtain user to the preference remaining non-scoring item in a large number to user's scoring, so option a, system can show better recommendation results to user.

Based on above analysis, be consider the impact of selected item in other non-scoring item set, and overcome that least to identify project to be the problem of frontier point, also need the representativeness weighing selected item.

From the ultimate principle of Collaborative Filtering Recommendation Algorithm, when system obtains the scoring of new user to be recommended to a candidate items, this scoring can affect the score in predicting to other non-scoring item, weigh the representativeness of candidate items with this impact, this representative measure take into account the existing preference information of new user.The score value variable effect of scoring to other non-scoring items of candidate items is larger, then illustrate that the representativeness of current project is higher.Fig. 6 and Fig. 7 sets forth candidate items x _iwith x _jscore value change after schematic diagram on the impact of other non-scoring items, as can be seen from diagram, candidate items x _iscore value change after to the scoring variable effect of other non-scoring items more greatly, therefore think project x _irepresentativeness higher than project x _jrepresentativeness, i.e. rep (x _i) >rep (x _j).

In commending system, the project scoring that general user provides is all limited several score values, and as 0,1 expression is not liked, liked, or film recommends the scoring of 5 values (1-5) conventional in scene.The score value that user can provide is designated as r, and corresponding possible score value set is designated as R, r ∈ R.Be similar to the institute's likely classification expecting to consider not mark sample in error rate reduction strategy, consider all score values that user may provide.According to the existing scoring of user, utilize collaborative filtering method to calculate the probability distribution of prediction scoring set R and counting user has been marked of user, regard prediction scoring the true scoring of user to candidate items as, each r value is the changes values of prediction scoring.Under different probability, estimate that scoring changes the impact changed the score value of other non-scoring items, target finds out the project larger on other project scoring impact, namely representative higher project.Based on above analysis, can obtain based on the representational method of scoring variable effect metric terms.Be described in detail as follows:

First, provide following symbol description: c represents current new user, and x represents current candidate project, represent the non-scoring item set of c, represent the set of scoring item of c, represent the non-scoring item set of residue removing c after x, namely in each project x _irepresent, the training dataset that c is corresponding, R _cxrepresent that c is to the scoring of x.

At training set T _con calculate c according to forecast model θ the prediction of x marked and estimate that c is that x marks as Probability p (U=c, the R of r _cx=r), and r is used as y _cx(θ) changing value, changing value formula is as follows:

{\hat{y}}_{cx} (θ) = r

Upgrade scoring training set T _c, by prediction scoring changing value add in the aggregate list of scoring item of c, obtain new scoring training set, formula is as follows:

T _c,r＝T _c∪(x,r)

At scoring training set T _cand T _c,ron, according to forecast model θ, c is to non-scoring item set in prediction in other non-scoring item x _iscore value, obtain respectively on corresponding training set prediction scoring be with at Probability p (U=c, R that scoring is r _cx=r) under, estimate that the scoring of current candidate project x changes the impact of marking on other project forecast, use with square expression scoring change of difference.Known, representative rep (x) the measure formula of current candidate project x is as follows:

rep (x) = \underset{x_{i} &Element; X_{c}^{(u \ x)}}{Σ} \underset{r &Element; R}{Σ} (p (U = c, R = r) {(y_{{cx}_{i}}^{T_{c}} (θ) - y_{{cx}_{i}}^{T_{c, r}} (θ))}^{2}) .

S103, according to described uncertainty with representative select the project that information content is the highest.

Based on the representative measure of above scoring variable effect, not only take into account the representativeness of project but also take full advantage of the existing score information of each user.In non-scoring item set, consider that the representativeness of project can overcome the problem may choosing outlier or isolated point in uncertain item selection procedure, cause picking out the project of marking to user not representative, thus cannot effectively predict more user preference information.Consider uncertainty and the representativeness of project, the project x that choose information content is the highest ^*, conventional combined method formula is as follows:

x^{*} = \underset{x &Element; X_{c}^{(u)}}{\arg \max} {uncertainty (x) \times rep (x)} .

Fixed Combination method disclosed in above-described embodiment considers uncertainty and the representativeness of project, and the project that the two product value is larger is the higher project of information content, and it overcomes the deficiency based on uncertain criterion picks project approach to a certain extent.But in each iterative process, be all uncertainty and the representativeness of weighing project simultaneously, need all items that processes in non-scoring item set, when non-scoring item set is larger or representative metrics process more complicated time, calculated amount undoubtedly can be very high.Consider and show that the least possible, that information content is high project is marked to express user preference information better to new user, the items selection that information content is low should be avoided.If system can determine the hobby of new user to a certain project according to existing score information, that does not just need to select this intermediate item and marks to user, so just can avoid the project selecting information content low.So, a kind of serial combination can be adopted to select the method for project: the uncertainty first adopting the non-scoring item of uncertain reduction criterion, uncertainty is sorted from high to low, the most uncertain project of selective system, least to be identified project set (The MostUncertain Item Set, be called for short MUIS).And for overcoming, least to identify project may be the problem of independent point or outlier, the representativeness of project in MUIS set is calculated by representative standard, then representativeness sequence is carried out to the project in MUIS set, select representative high Project Exhibition to user, can ensure that the existing higher uncertainty of project transferring to user to mark also has higher representativeness like this.

The method of serial combination can avoid selective system to compare the project determined, relative to the method for fixed Combination, effectively can improve the efficiency of items selection, avoids Project Exhibition low for information content to mark to user.The method also has certain drawback, and project that representativeness higher lower for uncertainty, is certainly excluded outside MUIS set, and namely the method is the representativeness of the project of sacrificing to a certain extent is cost.But, in practical situations both, be difficult to uncertainty or the representativeness of determining consideration project emphatically.Fixed Combination method treats uncertainty and the representativeness of project on an equal basis, also there is similar problem.For this, the present invention discloses the another kind of project selection method based on self-adaptation Active Learning on the basis of above-described embodiment.

As shown in Figure 2, a kind of project selection method based on self-adaptation Active Learning disclosed in another embodiment of the present invention, comprising:

The uncertainty of S201, calculated candidate project;

The representativeness of S202, calculated candidate project;

S203, in advance appointment weights set W, W={w ₁, w ₂..., w _n-1, w _n, its size | W|=n;

S204, candidate items set I is set for empty,

S205, for current weight w _i, w _i∈ W, L candidate items before selecting, forms project set I _i;

S206, renewal candidate items set I=I ∪ I _i;

S207, at user c existing scoring set T _cupper training obtains forecast model θ, calculates c mark to the prediction of project x according to θ upgrade training set T _c;

S208, calculate prediction effort analysis ε (x) corresponding to each project;

S209, from candidate items set I, select the project x of most information content ^*.

Concrete, the principle of work of above-described embodiment is: fixed Combination method and serial combination method all exist can not dynamic conditioning project uncertainty and representational weight allocation problem.After the uncertainty of given project and representative measure, target proposes a kind of group frame, can integrate uncertainty and representational advantage.Object is the candidate items ensureing to pick out, and is uncertain, and concentrates higher representativeness at non-scoring item relative to current system.Therefore, when candidate items being joined after scoring item collection, the collaborative filtering model after the renewal obtained can better predict the preference information of user, thus recommends more accurately for new user provides.Group frame conventional in research is exactly the form using product.Suppose that the uncertainty of current candidate project x is expressed as uncertainty (x), representativeness is expressed as rep (x), then combine consequent object information content info (x) and be expressed as:

info(x)＝uncertainty(x) ^w×rep(x) ^(1-w)；

The project x of most information content ^*for:

x^{*} = \underset{x &Element; X_{c}^{(u)}}{\arg \max} {info (x)};

Wherein w (0≤w≤1) is a weighting factor of item controlled uncertainty and representative size.As w>0.5, when option is described, the uncertain weight of project is greater than representativeness; As w<0.5, when selecting project, then the representativeness of priority discipline.Under extreme case, if w=1, combined method has then become single uncertain project selection method; If w=0, then it is single representative items system of selection.There is a unavoidable problem in this group frame, the size being exactly weighting factor w is difficult to determine.In different situations, being difficult to determine should the uncertainty of priority discipline or representative.And in the process of active option, the importance of two kinds of standards also should dynamic conditioning.In order in the process of items selection, dynamic adjustment w, selects the project of current most worthy to transfer to user to mark, proposes the strategy of a kind of self-adaptation combination, specifically describe as follows:

(1) weights set W:W={w is specified in advance ₁, w ₂..., w _n-1, w _n, its size | W|=n;

(2) in each item selection procedure, candidate items set I is set for empty;

(3) the uncertain uncertainty (x) of computational item x, representative rep (x);

(4) according to each value w in weights set _i(w _i∈ W), according to formula select a front L candidate items, obtain current candidate project set I _i;

(5) known final candidate items set I is I=I ₁∪ I ₂∪ ... ∪ I _n-1∪ I _n;

(6) optimum w value is selected namely to be equivalent to select the project that information content is the highest from candidate items set I.

In collaborative filtering recommending, the target that Active Learning is used for items selection is exactly pick out the high project score data of information content, to predict user preference information better, namely maximizes user satisfaction.Be similar to the thought of estimated error rate reduction strategy, devise can maximize user satisfaction project selection method with the weight w of adaptively selected optimum, its basic thought is: for each project x in candidate project set I, current collaborative filtering forecast model estimating user is used to mark to the prediction of x, x and prediction scoring thereof are simulated one by one and is added into training set of marking, upgrade training and obtain new forecast model, use new forecast model estimating user to the scoring of scoring item, selection can make user's project minimum with predicting the deviation of scoring of truly marking mark to user.User satisfaction maximization approach meets the ultimate principle of project-based collaborative filtering recommending, if user is interested in certain comparison of item, so can infer that this user also can like other project that comparison of item is similar therewith.Selection can make user truly mark and predict the project that the deviation of scoring is minimum, is namely the project selecting the information that can reflect user preferences.User satisfaction maximization approach specifically describes as follows:

At user c existing scoring set T _cupper training obtains forecast model θ, calculates c mark to the prediction of project x according to θ upgrade training set T _c

T_{c} = T_{c} \cup < x, y_{c, x}^{θ} >

According to the T after renewal _ctraining obtains new forecast model based on prediction c is to the training of scoring item set middle term order t (t ∈ T _c) scoring estimate true scoring and predict the deviation ε (x) marked, formula is as follows:

ϵ (x) = \underset{t &Element; T_{c}}{Σ} {(y_{c, t}^{\tilde{θ_{x}}} - R_{ct})}^{2}

Wherein, represent the collaborative filtering model after upgrading the c of prediction is to the scoring of project t.The minimum project of deviation ε (x) can be made namely to be meet user preference most, customer satisfaction system project can be made, the project that namely information content is the highest.Maximize strategy based on the user satisfaction provided above, selecting optimum weight w, is namely select the highest project of information content to transfer to user to mark from final candidate items set I, known, has the project x of information content most ^*choice criteria formula is as follows:

x^{*} = \underset{x &Element; I}{\arg \min} ϵ (x) = \underset{x &Element; I}{\arg \min} \underset{t &Element; T_{c}}{Σ} {(y_{c, t}^{\tilde{θ_{x}}} - R_{ct})}^{2}

So far, by weighing uncertainty, the representativeness of project in non-scoring item set, then maximize criterion picks by user satisfaction and to send as an envoy to the minimum project that namely information content is the highest of prediction effort analysis of new user scoring item set, mark to user.After obtaining user's score information, upgrade scoring item set, non-scoring item set, upgrade collaborative filtering forecast model, iteration said process, until reach stopping criterion (the scoring number provided as new user reaches certain quantity).

The uncertainty studied due to the present invention and representativeness strategy directly utilize the score information of user to measure, so forecast model θ adopts the collaborative filtering recommending method based on user, similarity measurement between user adopts Pearson came associated similarity method, the weighted mean Forecasting Methodology of scale problem of marking to the score in predicting employing consideration user of user.

In sum, the present invention, when the project that choose information content is the highest is marked to user, overcomes the deficiency based on uncertain items selection strategy, has considered uncertainty and the representativeness of project.Self-adaptation group frame, energy optimal correction uncertainty and representational combination, ensure the candidate items picked out, be uncertain, and concentrate higher representativeness at non-scoring item relative to current commending system.Therefore, when after the set of scoring item candidate items being joined user, the collaborative filtering model after the renewal obtained can better predict the preference information of user, thus recommends more accurately for user provides.

If the function described in the present embodiment method using the form of SFU software functional unit realize and as independently production marketing or use time, can be stored in a computing equipment read/write memory medium.Based on such understanding, the part of the part that the embodiment of the present invention contributes to prior art or this technical scheme can embody with the form of software product, this software product is stored in a storage medium, comprising some instructions in order to make a computing equipment (can be personal computer, server, mobile computing device or the network equipment etc.) perform all or part of step of method described in each embodiment of the present invention.And aforesaid storage medium comprises: USB flash disk, portable hard drive, ROM (read-only memory) (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disc or CD etc. various can be program code stored medium.

In this instructions, each embodiment adopts the mode of going forward one by one to describe, and what each embodiment stressed is the difference with other embodiment, between each embodiment same or similar part mutually see.

To the above-mentioned explanation of the disclosed embodiments, professional and technical personnel in the field are realized or uses the present invention.To be apparent for those skilled in the art to the multiple amendment of these embodiments, General Principle as defined herein can without departing from the spirit or scope of the present invention, realize in other embodiments.Therefore, the present invention can not be restricted to these embodiments shown in this article, but will meet the widest scope consistent with principle disclosed herein and features of novelty.

Claims

1. based on a project selection method for self-adaptation Active Learning, it is characterized in that, comprising:

The uncertainty of calculated candidate project;

The representativeness of calculated candidate project;

2. method according to claim 1, is characterized in that, the uncertainty of described calculated candidate project is:

3. method according to claim 2, is characterized in that, the representativeness of described calculated candidate project comprises:

rep (x) = \underset{x_{i} &Element; X_{c}^{(u \ x)}}{Σ} \underset{r &Element; R}{Σ} (p (U = c, R = r) {(y_{c x_{i}}^{T_{c}} (θ) - y_{{cx}_{i}}^{T_{c, r}} (θ))}^{2})

Calculate the representative rep (x) of current candidate project x, wherein: c represents current new user, and x represents current candidate project, represent the non-scoring item set of c, represent the set of scoring item of c, represent the non-scoring item set of residue removing c after x, namely in each project x _irepresent, the training dataset that c is corresponding, R _cxrepresent that c is to the scoring of x.

4. method according to claim 3, is characterized in that, the described project the highest with representative selection information content according to described uncertainty is:

5. method according to claim 1, is characterized in that, also comprises after the representativeness of described calculated candidate project:

Candidate items set I is set for empty,

Upgrade candidate items set I=I ∪ I _i;

Calculate prediction effort analysis ε (x) that each project is corresponding;

6. method according to claim 5, is characterized in that, described for current weight w _i, w _i∈ W, before selecting, L candidate items is:

According to described uncertain uncertainty (x) and representative rep (x), according to formula

Info (x)=uncertainty (x) ^w× rep (x) ^(1-w)calculate consequent object information content info (x) of combination;

7. method according to claim 6, is characterized in that, described at user c existing scoring set T _cupper training obtains forecast model θ, calculates c mark to the prediction of project x according to θ upgrade training set T _cfor:

According to formula upgrade training set T _c.

8. method according to claim 7, is characterized in that, prediction effort analysis ε (x) corresponding to each project of described calculating is:

9. method according to claim 8, is characterized in that, the described project x selecting most information content from candidate items set I ^*for:

According to formula

x^{*} = \underset{x &Element; I}{\arg \min} ϵ (x) = \underset{x &Element; I}{\arg \min} \underset{t &Element; T_{c}}{Σ} {(y_{c, t}^{\tilde{θ_{x}}} - R_{ct})}^{2}

Select the project x of most information content ^*.