CN104794250B - A kind of project selection method based on adaptive Active Learning - Google Patents

A kind of project selection method based on adaptive Active Learning Download PDF

Info

Publication number
CN104794250B
CN104794250B CN201510255684.4A CN201510255684A CN104794250B CN 104794250 B CN104794250 B CN 104794250B CN 201510255684 A CN201510255684 A CN 201510255684A CN 104794250 B CN104794250 B CN 104794250B
Authority
CN
China
Prior art keywords
project
scoring
user
prediction
uncertainty
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510255684.4A
Other languages
Chinese (zh)
Other versions
CN104794250A (en
Inventor
吴健
李承超
张宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Feiyu Mutual Entertainment Information Technology Co., Ltd.
Original Assignee
SUZHOU RONGXI INFORMATION TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SUZHOU RONGXI INFORMATION TECHNOLOGY Co Ltd filed Critical SUZHOU RONGXI INFORMATION TECHNOLOGY Co Ltd
Priority to CN201510255684.4A priority Critical patent/CN104794250B/en
Publication of CN104794250A publication Critical patent/CN104794250A/en
Application granted granted Critical
Publication of CN104794250B publication Critical patent/CN104794250B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of project selection methods based on adaptive Active Learning, including:Calculate the uncertainty of candidate items;Calculate the representativeness of candidate items;According to described uncertain and representative, the adaptively highest project of dynamic select information content.The present invention can consider the uncertainty of project and representativeness picks out the highest project of information content.

Description

A kind of project selection method based on adaptive Active Learning
Technical field
The present invention relates to commending system technical field more particularly to a kind of items selection sides based on adaptive Active Learning Method.
Background technology
In Collaborative Filtering Recommendation System, solve the problems, such as that the key of user's cold start-up is how quickly to establish new user's Interest preference model.When user initially uses system, the method based on Active Learning scoring guiding actively selects some projects Allow user's evaluation that can effectively obtain the personalization preferences information of user.Project is selected to user's scoring to consider for 2 points:(1) User can obtain the more score datas of user to project scoring, and score information is more, and commending system is more effective;(2) not All score informations are all equivalent, some, which score, can represent the customized information of user, some then cannot, therefore it is different Active Learning scoring bootstrap technique can bring different effects.For example popular project is selected always to user's evaluation, although can obtain More user's score datas are obtained, but the personalization preferences information that user is obtained for system helps less, because most of User likes popular project.Therefore, an effective Active Learning items selection strategy how is designed, choosing that can be as few as possible The higher project score data of information content is selected preferably to indicate problem and mesh that the preference information of user is very crucial Preceding urgent problem to be solved.
Invention content
The present invention provides a kind of project selection methods based on adaptive Active Learning, can consider project not Certainty and representativeness pick out the highest project of information content.
The present invention provides a kind of project selection methods based on adaptive Active Learning, including:
Calculate the uncertainty of candidate items;
Calculate the representativeness of candidate items;
According to the described uncertain and representative selection highest project of information content.
Preferably, the uncertainty for calculating candidate items is:
According to formulaThe uncertainty of candidate items is calculated, wherein: RcxIndicate scorings of the user c to project x,Indicate the average score of user, Ux(sim) indicate and it is current newly user it is similar and There is the user of scoring behavior to gather project x.
Preferably, the representativeness for calculating candidate items includes:
In training set TcOn c be calculated according to prediction model θ score the prediction of xAnd estimate that c is that x scorings are Probability p (U=c, the R of rcx=r), and by r as ycx(θ) changing value, wherein
Update scoring training set Tc, by prediction scoring changing valueIt is added to the scoring item aggregate list of c In, obtain new scoring training set Tc,r=Tc∪(x,r);
In scoring training set TcAnd Tc,rOn, according to prediction model θ, c is to non-scoring item set for predictionIn it is other Non- scoring item xiScore value, respectively obtain on corresponding training set prediction scoring beWith
In Probability p (U=c, the R that scoring is rcx=r) under, the scoring of estimation current candidate project x changes to other projects The influence for predicting scoring square indicates scoring variation, according to formula with differenceThe representative rep (x) of current candidate project x is calculated, Wherein:C indicates that current new user, x represent current candidate project, represent the non-scoring item set of c, indicate c's The non-scoring item set of residue of c after x is removed in scoring item set, expression, i.e.,In Each project xiIt indicates,It is the corresponding training datasets of c, RcxIndicate scorings of the c to x.
Preferably, it is described according to it is described uncertain and it is representative select the highest project of information content for:
According to formulaThe high project of information content is calculated, wherein: Uncertainty (x) is uncertainty, and rep (x) is representativeness, and c indicates that current new user, x represent current candidate project,Represent the non-scoring item set of c.
Preferably, further include after the representativeness for calculating candidate items:
Preassign weights set W, W={ w1,w2,…,wn-1,wn, size | W |=n;
It is sky that candidate items set I, which is arranged,
For current weight wi, wi∈ W, L candidate items before selecting constitute project set Ii
Update candidate items set I=I ∪ Ii
In the existing scoring set T of user ccUpper training obtains prediction model θ, and pre- test and appraisal of the c to project x are calculated according to θ PointUpdate training set Tc
Calculate the corresponding prediction effort analysis ε (x) of each project;
The project x of most information content is selected from candidate items set I*
Preferably, described to be directed to current weight wi, wi∈ W, select before L candidate items for:
According to the uncertainty uncertainty (x) and representativeness rep (x), according to formula info (x)= uncertainty(x)w×rep(x)(1-w)Calculate the consequent purpose information content info (x) of combination;
According to formulaCalculate the project x of most information content*, L candidate items before selecting.
Preferably, described in the existing scoring set T of user ccUpper training obtains prediction model θ, and c is calculated to item according to θ The prediction of mesh x is scoredUpdate training set TcFor:
According to formulaUpdate training set Tc
Preferably, the corresponding prediction effort analysis ε (x) of each project of calculating is:
According to updated TcTraining obtains new prediction modelIt is based onIt predicts that c trains scoring item to gather Middle project t (t ∈ Tc) scoringAccording to formulaThe deviation of estimation true scoring and prediction scoring ε (x), wherein:Indicate updated collaborative filtering modelScorings of the c of prediction to project t.
Preferably, the project x of most information content is selected in the I from candidate items set*For:
According to formulaThe project x of selection most information content*
By said program it is found that a kind of project selection method based on adaptive Active Learning provided by the invention, passes through To candidate items uncertainty and representative calculating, the uncertainty and representativeness of project is considered, have selected information When the highest project of content scores to user, the deficiency based on uncertain items selection strategy is overcome, letter can be picked out Cease the highest project of content.
Description of the drawings
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with Obtain other attached drawings according to these attached drawings.
Fig. 1 is a kind of flow chart of the project selection method based on adaptive Active Learning disclosed by the embodiments of the present invention;
Fig. 2 is a kind of flow of the project selection method based on adaptive Active Learning disclosed in another embodiment of the present invention Figure;
Fig. 3 is uncertain project schematic diagram;
Fig. 4 is uncertain sampling defect schematic diagram;
Fig. 5 is that representative items select schematic diagram;
Fig. 6 is project xiScoring variation influences schematic diagram;
Fig. 7 is project xjScoring variation influences schematic diagram.
Specific implementation mode
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation describes, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
As shown in Figure 1, being a kind of project selection method based on adaptive Active Learning disclosed by the embodiments of the present invention, packet It includes:
S101, the uncertainty for calculating candidate items;
In Active Learning classification field, the cardinal principle based on uncertain sampling is exactly to be concentrated every time from unlabeled data When selecting data sample, it is desirable that be equivalent to current learning model be least determining to the sample that do not mark for being selected or being constructed 's.In Collaborative Filtering Recommendation System, uncertain project, which refers to just system, cannot judge that user likes it project of attitude. If according to the scoring history of user and the score information of other users, system can Accurate Prediction user like (or not liking Vigorously) this project, then illustrate what this project was to determine, is otherwise exactly uncertain project.Since user is to the score information of project Can indicate the preference of user, thus most of researchs with score information come the uncertainty of metric terms.User is to project Scoring is more inconsistent, claims the uncertainty of this intermediate item higher.As shown in figure 3, in 3 users (User1, User2, User3) To in the rating matrix of 3 projects (Item1, Item2, Item3), 3 users are least consistent to the score value of project Item1, So the uncertainty of project Item1 is higher than project Item2 and Item3.
The uncertainty of project is higher, and the user that illustrate to have scored is bigger to the dispute of this intermediate item, and commending system is affirmed not It can determine that interest of the user to be recommended to this intermediate item.The uncertain high project of selection gives user to be recommended scoring, can from The specific scoring to this intermediate item is obtained with family, to be best understood from the preference information of user.It is not true for project x Qualitative uncertainty (x) calculation formula is as follows:
Wherein, UxIt represents and the user that project x has scoring is gathered, | Ux| indicate user's number, RcxIndicate user c to project The scoring of x,Indicate the average score of user.
When calculating current candidate purpose variance, existing method is all based on the score data of all users in system Come what is calculated.For a project, if being calculated according to the scoring of all users in system, variance is relatively low, and to be recommended new It is very high that its variance is calculated in the similar users of user, for this intermediate item, according to the side of existing system overall situation user metric variance Method will not select this project to user certainly.By the basic principle of collaborative filtering it is found that its similar users is not true to this project Fixed, then after score in predicting calculates, this purpose uncertainty is still very high.In order to solve this problem, in calculating side When poor, only according to the scoring distributed intelligence of the similar users of user to be recommended come computational item purpose variance.Improved computational item Mesh uncertainty measure method formula is as follows:
Wherein, Ux(sim) it indicates similar with current new user and there is the user of scoring behavior to gather project x.
S102, the representativeness for calculating candidate items;
In Active Learning classification problem, uncertainty only embodies the candidate shadow for not marking sample to current class device It rings, does not account for its information content in not marking sample set largely.Most uncertain sample may be in many cases, Independent point or noise spot.As shown in figure 4, triangle and diamond shape represent the sample marked in sample set, remaining circle represents not Mark the sample in sample set.Due to xANearest from classification boundaries line, the influence to grader is maximum, is adopted using uncertainty Quadrat method, inevitable choice xASample transfers to human expert to mark, due to xAIt is isolated point, it is more likely that cause classification boundaries line wrong It moves, if the classification boundaries line of grader in figure is by the position that original solid line position variation is dotted line, may cause to classify in this way It largely malfunctions when the remaining sample of the category.In fact, sample xBWith higher information content, because it more can representative sample Overall distribution should select xBArtificial mark.In order to solve the problems, such as above-mentioned isolated point or noise spot, need to consider current sample not Mark the representativeness in sample set.
In collaborative filtering recommending, there is also similar problems for the selection of indeterminate purpose, because based on uncertainty The uncertain project of standard selection merely reduces the uncertainty of current project, is scored it by user and is merely capable of understanding User does not know that preference of the user to sundry item, i.e., uncertain reduction method do not have to the preference of currently selected project Consider the relationship of currently selected project and a large amount of non-scoring items, it cannot be from the global uncertainty for reducing other projects.Fig. 5 gives Go out a diagram to explain.Solid circles indicate that scoring item, empty circles indicate non-scoring item in figure, and user is in figure Project of the distance relatively closely, in same category set has similar scoring behavior.After a project in generic is scored, Can reduce it is generic in other projects uncertainty.For two projects a and d in system, if the user that scored is to item The scoring of mesh d is more inconsistent, and the uncertainty of d is more than a, i.e. uncertainty (d)>Uncertainty (a), uncertainty contracting Subtracting strategy can select project d to score to user, but representative highers of the apparent project a in remaining non-scoring item, selection The project can obtain preference of the user to a large amount of remaining non-scoring items to user's scoring, so selection project a, system can give User shows better recommendation results.
Based on the above analysis, to consider influence of the selected item in other non-scoring item set, and overcome least really Determine the problem of project may be boundary point, it is also necessary to weigh the representativeness of selected item.
By the basic principle of Collaborative Filtering Recommendation Algorithm it is found that when system obtains candidate items of new user couple to be recommended Scoring when, this scoring can influence score in predicting to other non-scoring items, and the generation of candidate items is weighed with this influence Table, this representativeness measure consider the existing preference information of new user.The scoring of candidate items does not comment other The score value variation influence of sub-item is bigger, then illustrates that the representativeness of current project is higher.Candidate is set forth in Fig. 6 and Fig. 7 Project xiWith xjScore value variation after influence to other non-scoring items schematic diagram, as can be seen that candidate item from diagram Mesh xiScore value variation after bigger is influenced on the scorings of other non-scoring items variation, therefore, it is considered that project xiRepresentativeness it is high In project xjRepresentativeness, i.e. rep (xi)>rep(xj)。
In commending system, the project scoring that general user provides all is limited several score values, and such as 0,1 indicates not like Vigorously, like or film recommend scene in common 5 values (1-5) scoring.The score value that user is capable of providing is denoted as r, Corresponding possible scoring value set is denoted as R, r ∈ R.Reduce in strategy similar to desired error rate and considers not mark all of sample Possible classification considers all score values that user may provide.According to the existing scoring of user, calculated using collaborative filtering method The prediction scoring of user and counting user have scored the probability distribution of set R, and prediction is scored and regards user as to candidate items True scoring, each r values are the change value of prediction scoring.Estimation scoring variation is to other non-scoring items under different probability The influence of score value variation, target are to find out the project being affected to the scoring of other projects, i.e., representative higher project.Base In the above analysis, the method based on scoring variation influence measures sports representative's property can be obtained.It is described in detail as follows:
First, following symbol description is provided:C indicates that current new user, x represent current candidate project,Represent c not Scoring item set,Indicate the scoring item set of c,The non-scoring item set of residue of c after x is removed in expression, I.e. In each project xiIt indicates,It is the corresponding trained numbers of c According to collection, RcxIndicate scorings of the c to x.
In training set TcOn c be calculated according to prediction model θ score the prediction of xAnd estimate c be x scoring be r Probability p (U=c, Rcx=r), and by r as ycx(θ) changing value, changing valueFormula is as follows:
Update scoring training set Tc, by prediction scoring changing valueIt is added in the aggregate list of scoring item of c, New scoring training set is obtained, formula is as follows:
Tc,r=Tc∪(x,r)
In scoring training set TcAnd Tc,rOn, according to prediction model θ, c is to non-scoring item set for predictionIn it is other Non- scoring item xiScore value, respectively obtain on corresponding training set prediction scoring beWithIt is r in scoring Probability p (U=c, Rcx=r) under, the scoring of estimation current candidate project x changes the influence scored other project forecast, usesWithSquare expression scoring variation of difference.It is found that representative rep (x) measure of current candidate project x Formula is as follows:
S103, the highest project of information content is selected according to described uncertain and representativeness.
Based on the representative measure that the above scoring variation influences, the representativeness of project was not only considered but also had made full use of The existing score information of each user.Consider that the representativeness of project can overcome uncertain project in non-scoring item set The problem of outlier or isolated point may be chosen in selection course, leads to pick out the project to score to user without generation Table, to can not effectively predict more user preference informations.The uncertainty and representativeness for considering project, are selected The highest project x of information content*, common combined method formula is as follows:
Fixed Combination method disclosed in above-described embodiment considers the uncertainty and representativeness of project, the two product value Larger project is the higher project of information content, is overcome to a certain extent based on uncertain criterion picks project The deficiency of method.All it is the uncertainty and representativeness for weighing project simultaneously but in each iterative process, needs processing not All items in scoring item set, when non-scoring item set is larger or representative metrics process is more complicated, meter Calculation amount undoubtedly can be very high.In view of showing that the project as few as possible, information content is high scores to new user preferably to express User preference information, the items selection that information content should be avoided low.If system can determine new according to existing score information To the hobby of a certain project, that scores there is no need to select this intermediate item to user user, thus can be to avoid selection information The low project of content.So a kind of method that serial combination selects project may be used:First using uncertain reduction standard weighing apparatus The uncertainty of the non-scoring item of amount, uncertainty is sorted from high to low, is selected the most uncertain project of system, is obtained least Set of identifying project (The Most Uncertain Item Set, abbreviation MUIS).And to overcome possibility of least identifying project For independent point or outlier the problem of, with representative standard calculate MUIS set in project representativeness, then to MUIS gather In project carry out representative sequence, the representative high Project Exhibition of selection to user can ensure that user is transferred to score in this way The existing higher uncertainty of project also have higher representativeness.
The method of serial combination can avoid the project that selection systematic comparison determines, relative to the method for fixed Combination, energy The efficiency for enough effectively improving items selection avoids the low Project Exhibition of information content from scoring to user.This method also has centainly The drawbacks of, for uncertain relatively low and representative relatively high project, it is excluded certainly except MUIS set, i.e. the party Method is the representativeness with the project of sacrificing to a certain extent for cost.However, in practical situations, it is difficult to which determination is to resit an exam The uncertainty or representativeness of worry project.Fixed Combination method treats the uncertainty and representativeness of project on an equal basis, there is also Similar problem.In view of this, the present invention discloses another kind on the basis of the above embodiments is based on adaptive Active Learning Project selection method.
As shown in Fig. 2, for a kind of items selection side based on adaptive Active Learning disclosed in another embodiment of the present invention Method, including:
S201, the uncertainty for calculating candidate items;
S202, the representativeness for calculating candidate items;
S203, weights set W, W={ w are preassigned1,w2,…,wn-1,wn, size | W |=n;
S204, setting candidate items set I are sky,
S205, it is directed to current weight wi, wi∈ W, L candidate items before selecting constitute project set Ii
S206, update candidate items set I=I ∪ Ii
S207, in the existing scoring set T of user ccUpper training obtains prediction model θ, and c is calculated to the pre- of project x according to θ Test and appraisal pointUpdate training set Tc
S208, the corresponding prediction effort analysis ε (x) of each project is calculated;
S209, the project x that most information content is selected from candidate items set I*
Specifically, the operation principle of above-described embodiment is:Fixed Combination method and serial combination method all exist inactive State adjusted iterm uncertainty and representative weight distribution problem.The uncertainty of given project and representative measure Afterwards, target is to propose a kind of group frame, can integrate uncertain and representative advantage.Purpose is the candidate for ensureing to pick out Project is uncertain relative to current system, and is concentrated with higher representativeness in non-scoring item.Therefore, when time After option is added to scoring item collection, obtained updated collaborative filtering model can preferably predict the preference of user Information is more accurately recommended to be provided for new user.Common group frame is exactly the form using product in research.Assuming that The uncertainty of current candidate project x is expressed as uncertainty (x), and representative table is shown as rep (x), then combines consequent purpose Information content info (x) is expressed as:
Info (x)=uncertainty (x)w×rep(x)(1-w)
The most project x of information content*For:
Wherein w (0≤w≤1) is a weighting factor of item controlled uncertainty and representative size.Work as w>When 0.5, When illustrating selection project, the uncertain weight of project is more than representativeness;Work as w<When 0.5, when selecting project, then pay the utmost attention to The representativeness of project.Under extreme case, if w=1, combined method is at single uncertain project selection method;If w= 0, then it is single representative items selection method.This group frame there are a unavoidable problem, exactly weigh because The size of sub- w is difficult to determine.In different situations, it is difficult to it is determined that the uncertain or representative of priority discipline Property.And during active options purpose, the importance of two kinds of standards also should be adjusted dynamically.In order in items selection In the process, it is adapted dynamically w, selects the project of current most worthy that user is transferred to score, it is proposed that is a kind of adaptively to combine Strategy is described in detail below:
(1) weights set W is preassigned:W={ w1,w2,…,wn-1,wn, size | W |=n;
(2) in each item selection procedure, setting candidate items set I is sky;
(3) the uncertain uncertainty (x) of project x, representative rep (x) are calculated;
(4) according to each of weights set value wi(wi∈ W), according to formulaSelect preceding L A candidate items obtain current candidate project set Ii
(5) final candidate items set I known to is I=I1∪I2∪…∪In-1∪In
(6) optimal w values are selected that is, selecting the highest project of information content from candidate items set I.
In collaborative filtering recommending, target of the Active Learning for items selection is exactly to pick out the high project of information content Score data user preference information is better anticipated, that is, maximizes user satisfaction.Reduce similar to estimated error rate The thought of strategy, the project selection method of user satisfaction can be maximized with adaptively selected optimal weight w, base by devising This thought is:For each project x in candidate project set I, estimate user to x using current collaborative filtering prediction model Prediction scoring, by x and its prediction scoring one by one simulation be added to scored training set, update training obtains new prediction mould Type, the scoring using new prediction model estimation user to scoring item, selection can make user really score and prediction scoring Deviation minimum project give user scoring.User satisfaction maximization approach meets the base of project-based collaborative filtering recommending Present principles, if user is interested in some comparison of item, can speculate the user also can like and this comparison of item phase As other projects.Selection can make user really score and predict the project of the deviation minimum to score, and being selection can most reflect The project of user preference information.User satisfaction maximization approach is described in detail below:
In the existing scoring set T of user ccUpper training obtains prediction model θ, and pre- test and appraisal of the c to project x are calculated according to θ PointUpdate training set Tc
According to updated TcTraining obtains new prediction modelIt is based onIt predicts that c trains scoring item to gather Middle project t (t ∈ Tc) scoringThe deviation ε (x) of estimation true scoring and prediction scoring, formula are as follows:
Wherein,Indicate updated collaborative filtering modelScorings of the c of prediction to project t.It can make deviation ε (x) most Small project is to best suit user preference, can most make customer satisfaction system project, the i.e. highest project of information content.More than being based on The user satisfaction provided maximizes strategy, selects optimal weight w, is that letter is selected from final candidate items set I The breath highest project of content transfers to user to score, it is known that, the most project x of information content*Selection criteria formula is as follows:
So far, by uncertainty, the representativeness of project in the non-scoring item set of measurement, then most by user satisfaction Bigization criterion picks are sent as an envoy to the prediction effort analysis minimum i.e. highest project of information content of new user scoring item set, are given User scores.After obtaining user's score information, scoring item set, non-scoring item set, update collaborative filtering are pre- for update Model, the iteration above process are surveyed, until reaching stopping criterion (the scoring number that such as new user provides reaches certain quantity).
It is measured since the uncertain and representative strategy that the present invention studies is the score information directly using user, So prediction model θ uses the collaborative filtering recommending method based on user, the similarity measurement between user is related using Pearson came Similarity method uses the score in predicting of user the weighted average prediction technique for considering user's scoring scale problem.
In conclusion the present invention when selecting the highest project of information content and scoring to user, overcomes based on uncertain The deficiency of property items selection strategy, has considered the uncertainty and representativeness of project.Adaptive group frame, can optimal tune Whole uncertain and representative combination, ensures the candidate items picked out, is uncertain relative to current commending system, and And it is concentrated with higher representativeness in non-scoring item.Therefore, when the scoring item set for candidate items being added to user Afterwards, obtained updated collaborative filtering model can preferably predict the preference information of user, more accurate to provide to the user True recommendation.
If the function described in the present embodiment method is realized in the form of SFU software functional unit and as independent product pin It sells or in use, can be stored in a computing device read/write memory medium.Based on this understanding, the embodiment of the present invention The part of the part that contributes to existing technology or the technical solution can be expressed in the form of software products, this is soft Part product is stored in a storage medium, including some instructions are used so that computing device (can be personal computer, Server, mobile computing device or network equipment etc.) execute all or part of step of each embodiment the method for the present invention Suddenly.And storage medium above-mentioned includes:USB flash disk, read-only memory (ROM, Read-Only Memory), is deposited mobile hard disk at random The various media that can store program code such as access to memory (RAM, Random Access Memory), magnetic disc or CD.
Each embodiment is described by the way of progressive in this specification, the highlights of each of the examples are with it is other The difference of embodiment, just to refer each other for same or similar part between each embodiment.
The foregoing description of the disclosed embodiments enables those skilled in the art to implement or use the present invention. Various modifications to these embodiments will be apparent to those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, of the invention It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one The widest range caused.

Claims (6)

1. a kind of project selection method based on adaptive Active Learning, which is characterized in that including:
According to formulaThe uncertainty of candidate items is calculated, wherein:RcxTable Show scorings of the user c to project x,Indicate the average score of user, Ux(sim) indicate similar with current new user and to project X has the user of scoring behavior to gather;
In training set TcOn c be calculated according to prediction model θ score y to the prediction of xcx(θ), and estimate c be x scoring be r's Probability p (U=c, Rcx=r), and by r as ycx(θ) changing value, wherein
Update scoring training set Tc, by prediction scoring changing valueIt is added in the aggregate list of scoring item of c, obtains New scoring training set Tc,r=Tc(x,r)
In scoring training set TcAnd Tc,rOn, according to prediction model θ, c is to non-scoring item set for predictionIn other do not comment Sub-item xiScore value, respectively obtain on corresponding training set prediction scoring beWith
In Probability p (U=c, the R that scoring is rcx=r) under, the scoring of estimation current candidate project x changes to other project forecast The influence of scoring is usedWithSquare expression scoring variation of difference, according to formulaThe representative rep (x) of current candidate project x is calculated, Wherein:C indicates that current new user, x represent current candidate project,The non-scoring item set of c is represented,Indicate c Scoring item set,The non-scoring item set of residue of c after x is removed in expression, i.e., In Each project xiIt indicates,It is the corresponding training datasets of c, RcxIndicate scorings of the c to x;
According to formulaThe high project of information content is calculated, wherein: Uncertainty (x) is uncertainty, and rep (x) is representativeness, and c indicates that current new user, x represent current candidate project,Represent the non-scoring item set of c.
2. according to the method described in claim 1, it is characterized in that, further including after the representativeness for calculating candidate items:
Preassign weights set W, W={ w1,w2,…,wn-1,wn, size | W |=n;
It is sky that candidate items set I, which is arranged,
For current weight wi, wi∈ W, L candidate items before selecting constitute project set Ii
Update candidate items set I=I ∪ Ii
In the existing scoring set T of user ccUpper training obtains prediction model θ, and calculating c according to θ scores to the prediction of project x Update training set Tc
Calculate the corresponding prediction effort analysis ε (x) of each project;
The project x of most information content is selected from candidate items set I*
3. according to the method described in claim 2, it is characterized in that, described be directed to current weight wi, wi∈ W, L time before selection Option is:
According to the uncertainty uncertainty (x) and representativeness rep (x), according to formula info (x)=uncertainty (x)w×rep(x)(1-w)Calculate the consequent purpose information content info (x) of combination;
According to formulaCalculate the project x of most information content*, L candidate items before selecting.
4. according to the method described in claim 3, it is characterized in that, described in the existing scoring set T of user ccUpper training obtains Prediction model θ calculates c according to θ and scores the prediction of project xUpdate training set TcFor:
According to formulaUpdate training set Tc
5. according to the method described in claim 4, it is characterized in that, described calculate the corresponding prediction effort analysis ε of each project (x) it is:
According to updated TcTraining obtains new prediction model θx, it is based on θxPredict that c trains set middle term mesh to scoring item t(t∈Tc) scoringAccording to formulaThe deviation ε (x) of estimation true scoring and prediction scoring, In:Indicate updated collaborative filtering model θxScorings of the c of prediction to project t.
6. according to the method described in claim 5, it is characterized in that, selecting most information to contain in the I from candidate items set The project x of amount*For:
According to formulaThe project x of selection most information content*
CN201510255684.4A 2015-05-19 2015-05-19 A kind of project selection method based on adaptive Active Learning Active CN104794250B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510255684.4A CN104794250B (en) 2015-05-19 2015-05-19 A kind of project selection method based on adaptive Active Learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510255684.4A CN104794250B (en) 2015-05-19 2015-05-19 A kind of project selection method based on adaptive Active Learning

Publications (2)

Publication Number Publication Date
CN104794250A CN104794250A (en) 2015-07-22
CN104794250B true CN104794250B (en) 2018-10-19

Family

ID=53559042

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510255684.4A Active CN104794250B (en) 2015-05-19 2015-05-19 A kind of project selection method based on adaptive Active Learning

Country Status (1)

Country Link
CN (1) CN104794250B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101685458A (en) * 2008-09-27 2010-03-31 华为技术有限公司 Recommendation method and system based on collaborative filtering

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001096255A (en) * 1999-10-01 2001-04-10 Matsushita Electric Ind Co Ltd Method of recycling copper powder

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101685458A (en) * 2008-09-27 2010-03-31 华为技术有限公司 Recommendation method and system based on collaborative filtering

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于样本不确定性和代表性相结合的可控主动学习算法研究;胡正平等;《燕山大学学报》;20090731;第33卷(第4期);第341-346页 *

Also Published As

Publication number Publication date
CN104794250A (en) 2015-07-22

Similar Documents

Publication Publication Date Title
JP6109037B2 (en) Time-series data prediction apparatus, time-series data prediction method, and program
CN109948066B (en) Interest point recommendation method based on heterogeneous information network
CN108256093A (en) A kind of Collaborative Filtering Recommendation Algorithm based on the more interest of user and interests change
CN110503531A (en) The dynamic social activity scene recommended method of timing perception
CN108460101A (en) Point of interest of the facing position social networks based on geographical location regularization recommends method
CN109635206B (en) Personalized recommendation method and system integrating implicit feedback and user social status
CN108038730A (en) Product similarity determination methods, device and server cluster
JPWO2010010653A1 (en) User model processing device
WO2010010654A1 (en) Usage estimation device
JP2010086150A (en) Regional information retrieving device, method for controlling regional information retrieving device, regional information retrieving system and method for controlling regional information retrieval system
CN103383702A (en) Method and system for recommending personalized news based on ranking of votes of users
CN108304935A (en) Machine learning model training method, device and computer equipment
CN108470075A (en) A kind of socialization recommendation method of sequencing-oriented prediction
JP2011118110A (en) Map display device, map display method, and map display program
Li et al. From reputation perspective: a hybrid matrix factorization for qos prediction in location‐aware mobile service recommendation system
CN110147514B (en) Resource display method, device and equipment thereof
KR101821790B1 (en) Apparatus and method for a clustering-based recommendation considering user preferences
CN105681089B (en) Networks congestion control clustering method, device and terminal
CN107909498B (en) Recommendation method based on area below maximized receiver operation characteristic curve
KR101028810B1 (en) Apparatus and method for analyzing advertisement target
CN109658187A (en) Recommend method, apparatus, storage medium and the electronic equipment of cloud service provider
CN104794250B (en) A kind of project selection method based on adaptive Active Learning
CN112954066A (en) Information pushing method and device, electronic equipment and readable storage medium
WO2014020299A1 (en) Location evaluation
CN113377532B (en) Edge computing server deployment method and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20181212

Address after: 215021 Unit 16-B502, Creative Industry Park, 328 Xinghu Street, Suzhou Industrial Park, Jiangsu Province

Patentee after: Suzhou Feiyu Mutual Entertainment Information Technology Co., Ltd.

Address before: 215021 Room B302, 16th Building, International Science and Technology Park Phase 5 Creative Industry Park, 328 Xinghu Street, Suzhou Industrial Park, Jiangsu Province

Patentee before: SUZHOU RONGXI INFORMATION TECHNOLOGY CO., LTD.