Invention content
In view of this, the Active Learning scoring bootstrap technique that the object of the present invention is to provide a kind of based on matrix decomposition and being
System improves preferably to predict the preference information of user and recommends accuracy rate.
To achieve the goals above, the present invention provides the following technical solutions:
On the one hand, the Active Learning that the present invention provides a kind of based on matrix decomposition scores bootstrap technique, including:
Step A:The user characteristics of new user, the user characteristics of other users, new user non-scoring item are obtained respectively
The item characteristic of item characteristic and other users scoring item;The user characteristics of the new user, the use of the other users
The item characteristic of family feature, the item characteristic of the new non-scoring item of user and the other users scoring item is base
It is obtained in the factorization training of user's rating matrix;
Step B:The cosine phase between user characteristics and the user characteristics of the other users by calculating the new user
Similar users are obtained like degree;The similar users are user similar with the new user in the other users;
Step C:Using the popularity of scoring item and information content obtain optimal project in the similar users, and will
The optimal project transfers to the new user, to score the optimal project using the new user, obtains scoring knot
Fruit;The information content of the scoring item is item characteristic by calculating the new non-scoring item of user and described similar
User's cosine similarity acquisition of the item characteristic of scoring item.
Preferably, the method further includes:
Step D:The user characteristics of the new user are updated according to the appraisal result, and execute step B, until cycle time
Number reaches preset times.
Preferably, the method further includes before step A:
Obtain user's rating matrix.
Preferably, the step C includes:
Step C1:Determine the popularity of similar users scoring item;
Step C2:Calculate the product of the popularity and described information content;
Step C3:Judge whether the popularity obtained and the product of described information content are maximum, if it is, executing
Step C4;If it is not, then return to step C2;
Step C4:When the product of the popularity of acquisition and described information content is maximum, the similar use is determined
Scoring item is the optimal project at family.
On the other hand, the Active Learning scoring guiding system based on matrix decomposition that the present invention also provides a kind of, including:
First acquisition module, for obtaining the user characteristics of new user, the user characteristics of other users, new user respectively not
The item characteristic of the item characteristic and other users of scoring item scoring item;The user characteristics of the new user, it is described its
The project of the user characteristics of his user, the item characteristic of the new non-scoring item of user and the other users scoring item
Feature is all based on the factorization training acquisition of user's rating matrix;
Second acquisition module, the user characteristics for user characteristics and the other users by calculating the new user
Between cosine similarity obtain similar users;The similar users are use similar with the new user in the other users
Family;
Third acquisition module, for using the popularity of scoring item and information content obtain most in the similar users
Excellent project, and the optimal project is transferred into the new user, to be scored the optimal project using the new user,
Obtain appraisal result;The information content of the scoring item is the item characteristic by calculating the new non-scoring item of user
With the similar users cosine similarity acquisition of the item characteristic of scoring item.
Preferably, the system also includes:
Update module, the user characteristics for updating the new user according to the appraisal result, and return to execution and pass through
The cosine similarity calculated between the user characteristics and the user characteristics of the other users of the new user obtains similar users, directly
Reach preset times to cycle-index.
Preferably, the system also includes:
Acquisition module, for obtaining user's rating matrix.
Preferably, the third acquisition module includes:
Acquiring unit, the popularity for determining similar users scoring item;
Computing unit, the product for calculating the popularity and described information content;
Judging unit, whether the popularity and the product of described information content for judging to obtain are maximum, if so,
Then determine that scoring item is the optimal project to the similar users;If it is not, then return execute calculate the popularity with
The product of described information content.
Compared with prior art, advantages of the present invention is as follows:
The Active Learning scoring bootstrap technique and system that the present invention provides a kind of based on matrix decomposition, are scored by user
Matrix factorisation obtains the item of the user characteristics of new user, the user characteristics of other users, the non-scoring item of new user respectively
The item characteristic of mesh feature and other users scoring item;And then by calculate the new user user characteristics and it is described its
Cosine similarity between the user characteristics of his user obtains similar users;And using the stream of scoring item in the similar users
Row degree and information content obtain optimal project, and the optimal project is transferred to the new user, to use the new user couple
The optimal project scores, and obtains appraisal result;The information content of the wherein described scoring item is by described in calculating
The cosine similarity of the item characteristic of the new non-scoring item of user and the similar users item characteristic of scoring item obtains
's.Compared with prior art, the present invention is preferably pre- using Active Learning scoring bootstrap technique and system based on matrix decomposition
The preference information of user has been surveyed, and then has improved recommendation accuracy rate.
Specific implementation mode
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation describes, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
Actively some projects in selection set have marking, some projects are not given a mark.In Active Learning mistake
Cheng Zhong, if having selected the project of marking, collaborative filtering recommending model obtains once to the opportunity to study of user, otherwise,
It is equivalent to by the refusal of user and wastes an opportunity to study.It is proposed that the purpose of Active Learning Method be exactly to make to choose
The project selected most possibly is answered by new user and can make full use of active user to be believed with the scoring that training user provides
Breath.Due to less with the interaction times of user, score data of the Active Learning for acquisition is few, depends only on new user and carries
The several scorings supplied, cannot obtain sufficiently information and go to find suitable inquiry, especially when the number of entry is prodigious.We
Solution be exactly new user is combined with training user, using training user available information provide effectively look into
It askes.
Active Learning Method provided by the invention is the Active Learning based on pond (Pool-based) type.To solve user
Cold start-up problem, it is exactly to belong to Pool-based types actively to select project to new user scoring.Pool-based is actively learned
It practises generally there are one not marking sample pool, in collaborative filtering recommending model, non-scoring item collection that new user did not evaluate
Conjunction, which just constitutes, does not mark sample pool, and Active Learning Method selects letter in each iterative process, according to certain selection strategy
It ceases the highest project of content to score to new user, then is used for the score data that user provides to indicate user interest preference again,
Final set up can be that new user generates the collaborative filtering recommending model effectively recommended.It was cooperateed with as shown in Figure 1, being primarily based on
Existing user's score information learns to obtain global collaborative filtering model in filter recommended models, is provided according to cold start-up user first
Begin scoring study obtain the interest preference of new user, then collaborative filtering recommending model according to certain items selection strategy never
The project of most information content is selected in scoring item set, and user is allowed to give a mark for it.It, will after user provides score information
New scoring is added in score data set, re -training collaborative filtering recommending model, the selection for project next time.Under
An iteration collaborative filtering recommending model selects other projects and scores for user, and end condition is reached with new user's interaction times
Certain standard.Active Learning scores after the completion of bootup process, and collaborative filtering recommending model can be according to existing information content
High score information generates project recommendation list to predict the preference information of user for new user.For with as few as possible
Interaction times obtain the high score data of information content, and critical issue is exactly never to select most worthy in scoring item set
Project.
Referring to FIG. 2, it illustrates it is provided in an embodiment of the present invention it is a kind of based on matrix decomposition Active Learning scoring draw
A kind of flow chart of guiding method, including:
Step A:The user characteristics of new user, the user characteristics of other users, new user non-scoring item are obtained respectively
The item characteristic of item characteristic and other users scoring item.
Wherein, the user characteristics of new user, the user characteristics of other users, the non-scoring item of new user item characteristic and
The item characteristic of other users scoring item is all based on what the factorization training of user's rating matrix obtained.
Factorization training is carried out first on user's rating matrix S, user and project is matched to the factor of K dimensions
Feature spaceObtain user characteristics factor feature space U and V corresponding with item characteristic.In factor feature spaceIn,
Each project x vectorsIt indicates.VxIn each element representation project possess the significance level of the corresponding factor.Some
The importance of the factor is high, and the importance of some factors is low.Vector U can also be used similarly, for given user uuIt indicates,
UuIn the corresponding ratio characteristics of each element representation user.User vector UuWith project vector VxInner product UuVx TJust reflect use
Family is to the whole preference information of item characteristic, so can just can be used to estimate that user u scores to the prediction of project x with inner product, such as
Shown in lower:
Wherein,It scores the prediction of project x for user u.
Critical issue herein is exactly the Corresponding matching calculated between each project and user, after calculating each matching value,
User's scoring can be predicted by the inner product of two factor features (user characteristics and item characteristic).Based on matrix factorisation
Method be exactly according to prediction score value decide whether for user generate recommendation.
To calculate the inner product of two factor features, need using have score data S train to obtain factor feature space U and
V.The method for obtaining element value in U and V is as follows:Element in random initializtion U and V first, then for all scorings in S
Data S ((u, x) ∈ S) calculates the prediction error e of collaborative filtering recommending modelu,x:
Wherein, RuxTrue scoring for user u to project x, generally 1-5.
Simultaneously as being to predict unknowable scoring using existing score data, it should the over-fitting on S be avoided to ask
Topic.For example, in collaborative filtering recommending model, some users always like beating high score, some comparison of item are popular, also always can
Scoring more higher than other projects is obtained, in order to avoid the overfitting problem in these data, it should add some penalty terms, come
Then the value of limited model parameter is obtained by minimizing the quadratic sum of global prediction error in factor feature space U and V
Element value.The global prediction estimation error being added after the prejudice factor is as follows:
Wherein, Opt (S, U, V) is prediction error;λ is deviation term coefficient.
It should be noted that prediction error is generally smaller, such as 0.0001.
Next, a locally optimal solution of factor feature space U and V are obtained using the method for stochastic gradient descent,
Corresponding partial derivative calculates as follows:
Wherein, Vx,k、Uu,kThe corresponding factor features of respectively project x factor feature corresponding with user u.
Iteration carries out the training process of factorization on scoring set S, and with certain learning rate constantly along gradient
Characteristic value in opposite direction update factor feature space U and V, until prediction error Opt (S, U, V) is reduced to a very little
Value or no longer change, i.e., collaborative filtering recommending model parameter reaches convergence.Indicate that learning rate, λ indicate punishment term system with α
Number, then the element update in U and V is as follows:
Uu,k←Uu,k-α(eu,xVx,k-λUu,k)
Vx,k←Vx,k-α(eu,xUu,k-λVx,k) (5)
For all score data (u, x) ∈ S, need to update entire UuAnd VxVector, the i.e. value of k be k ∈ 1,
2,...,K}。
Step B:The cosine similarity between user characteristics and the user characteristics of other users by calculating new user obtains
Similar users.
Wherein, similar users are user similar with new user in other users.
When matrix factorisation, which is trained, to be completed, user characteristics and item characteristic are all matched corresponding factor spy
It levies in space U and V, user or project with similar scoring behavior are matched identical region.We are according to factor
The user characteristics of new user in feature space find the similar users set of new user, then in the scoring of similar users
Select optimal project query user to score in project set, can not only obtain effective inquiry in this way, but can make full use of with newly
The score information of the similar users of user.Wherein, effective project query is selected, the lookup of new user's similar neighborhood is very heavy
The step of wanting, if similar users calculate the preference letter for accurately simulating new user according to the scoring behavior of similar users
Breath is just relatively more accurate.Certainly, if similar users selection is incorrect, the validity of selected item will be influenced.It is known that
In matrix factorisation, score in predicting be calculated according to the inner product of user characteristics and item characteristic, so we select it is remaining
String similarity calculates the similitude between user.User's factor feature space U is represented by:
For two user u in factor feature spaceiAnd uj, corresponding feature vector can be expressed as:
ui=[ui1,ui2,…,uiK]
uj=[uj1,uj2,…,ujK] (7)
Then user uiAnd ujSimilitude sim (ui,uj) calculate as follows:
Step C:Using the popularity of scoring item and information content obtain optimal project in similar users, and will be optimal
Project transfers to new user, to score optimal project using new user, obtains appraisal result.
Wherein, the information content of scoring item is item characteristic and institute by calculating the new non-scoring item of user
State the similar users cosine similarity acquisition of the item characteristic of scoring item.
After obtaining the similar neighborhood set of new user, need to examine in terms of the popular degree of project and information content two
The selection of worry project.
(1) project popularity metric
Popular project is selected to score to user in similar users scoring item set.New user tends to evaluation comparison
Popular project because and this user there are many users of similar behavior all to have scoring to this project, this meets collaborative filtering
The basic point of departure of method.Consider that popular project can ensure to obtain effective inquiry, and commenting for similar users can be made full use of
Divide information.
For project x, popularity pop (x) measurements are fairly simple, exactly there is the number of users of scoring to current project,
It calculates as follows:
In formula, XuIndicate that the non-scoring item set of new user u, c indicate the other users in collaborative filtering recommending model.
(2) information content is measured
The evaluation that popular project is inquired to user, can obtain more user's score datas, but for collaborative filtering
The personalization preferences information that recommended models obtain user helps less, so we are also contemplated that the information content of selected item,
Never the higher project of information content is selected in scoring item set.By the basic principle of project-based collaborative filtering it is found that
If user is interested in some comparison of item, we, which can speculate the user also, can like similar with this comparison of item
Other projects.It is contemplated that the similitude of the non-scoring item of new user and similar users scoring item, can improve new use
Family provides the possibility of scoring for queried for items, also can guarantee that user's scoring of selected item has higher information content, that is, looks into
The project of inquiry is that user prefers.
Similitude between calculating project, we consider as follows:
After the completion of matrix decomposition training, similar project has similar factor feature, so we use the item of project
Similitude between mesh characteristic measure project.Item characteristic SPACE V is represented by:
For two project x in project factor spaceiAnd xj, corresponding feature vector is just represented by:
xi=[xi1,xi2,…,xiK]
xj=[xj1,xj2,…,xjK] (11)
Calculate each project x in the non-scoring item set of new useriWith similar user scoring item set IuMiddle project
Similitude sim (xi,Iu), as follows:
By the similarity analysis between aforementioned measure user it is found that cosine similarity is between preferably calculating factor space characteristics
The standard of similitude, so project xiAnd xjSimilitude sim (xi,xj) calculate with calculating the similarity of user characteristics it is similar.
In summary two point analysis, when selecting project never in scoring item set every time, we are by popular degree and believe
The breath maximum project of content product value is seen as optimal project, and new user is transferred to score.Since popularity value is big compared with similarity value
Very much, in the two product, popularity plays a leading role, and to balance the influence of two attributes, we are using logpop (x) come table
Aspect purpose popularity, info (xi,Iu) indicate information content, then optimal project x*Selection criteria it is as follows:
The present embodiment by user's rating matrix factorization obtain respectively the user characteristics of new user, other users use
The item characteristic of family feature, the item characteristic of the non-scoring item of new user and other users scoring item;And then pass through calculating
Cosine similarity between the user characteristics and the user characteristics of the other users of the new user obtains similar users;And it uses
The popularity of scoring item and information content obtain optimal project in the similar users, and the optimal project is transferred to institute
New user is stated, to score the optimal project using the new user, obtains appraisal result;The wherein described item that scored
Purpose information content is item characteristic by calculating the new non-scoring item of user and the similar users scoring item
Item characteristic cosine similarity obtain.Compared with prior art, the present embodiment preferably predicts the preference letter of user
Breath, and then improve recommendation accuracy rate.
Referring to FIG. 3, it illustrates it is provided in an embodiment of the present invention it is a kind of based on matrix decomposition Active Learning scoring draw
Another flow chart of guiding method can also include the following steps on the basis of Fig. 1:
Step A1:Obtain user's rating matrix.
It should be noted that the embodiment of the present invention obtains the process of user's rating matrix and existing acquisition user's rating matrix
Method be identical, therefore details are not described herein.
Step D:According to the user characteristics of appraisal result update user, and step B is executed, until cycle-index reaches default
Number.
After a new user, which enters collaborative filtering recommending model, provides project scoring, update prediction collaborative filtering is needed to push away
Model is recommended, the user characteristics of new user are learnt.And existing many users in collaborative filtering recommending model, re -training are entirely assisted
Take a long time with filtered recommendation model.
Wherein, the time complexity of re -training factorization collaborative filtering recommending model is O (| S | × K × t), wherein t
Indicate that iterations, K indicate the dimension in factor space, | S | the size for the set that indicates to have scored.With the number of Netflix data sets
For, K=40, t=120, | S |=100,000,000, training is completed to need 480,000,000,000 feature updates.Therefore, it is necessary to excellent
Change the process of the entire collaborative filtering recommending model of re -training.
The present invention uses a kind of optimization method of new user's online updating, and the meaning of online updating is exactly to all users
After initial training, the scoring that later update is added just for new user is trained.After obtaining the scoring of new user,
The user characteristics of user are initialized as a random collection, then train collaborative filtering recommending mould according to the scoring that new user provides
Type.When new user provides scoring, this method is only that new user trains whole features, the other feature in matrix to keep not
Become.It considers that from the point of view of the overall situation, according to set S and S ∪ { RuxTrain obtained collaborative filtering recommending model almost identical
's.But if user is new user, as scoring RuxWhen being added in user's rating matrix, the user characteristics of this user can be because of this
A scoring changes very big.So only training the feature invariant of whole features of new user and the other users in holding matrix.
Analysis is it is found that the time complexity of online updating method is | C (u) | × K × t, | C (u) | it indicates new and uses
The scoring number that family is given.Since the number of entry that new user has scored is seldom, so new user's online updating method can be greatly improved
The newer speed of user characteristics.
After the user characteristics for having trained new user using online updating method, the aforementioned similar neighborhoods of iteration are searched, are optimal
Several processes such as project query are terminated until reaching predetermined queries number.
Referring to FIG. 4, it illustrates it is provided in an embodiment of the present invention it is a kind of based on matrix decomposition Active Learning scoring draw
A kind of sub-process figure of guiding method, may comprise steps of:
Step C1:Determine the popularity of similar users scoring item.
Step C2:Calculate the product of popularity and information content.
Step C3:Judge whether the popularity obtained and the product of described information content are maximum, if so, thening follow the steps
C4;If it is not, then return to step C2.
Step C4:When the product of the popularity of acquisition and information content is maximum, similar users scoring item is determined
For optimal project.
Corresponding with the embodiment of the above method, the embodiment of the present invention additionally provides a kind of active based on matrix decomposition
A kind of structural schematic diagram for practising scoring guiding system, as shown in figure 5, may include:First acquisition module 11, the second acquisition module
12 and third acquisition module 13, wherein:
First acquisition module 11, for obtaining the user characteristics of new user, the user characteristics of other users, new user respectively
The item characteristic of the item characteristic and other users of non-scoring item scoring item.
Wherein, the user characteristics of new user, the user characteristics of other users, the non-scoring item of new user item characteristic and
The item characteristic of other users scoring item is all based on what the factorization training of user's rating matrix obtained.
Second acquisition module 12, for remaining between the user characteristics and the user characteristics of other users by calculating new user
String similarity obtains similar users.
Similar users are user similar with new user in other users.
Third acquisition module 13, for using the popularity of scoring item and information content acquisition are optimal in similar users
Project, and optimal project is transferred into new user, to score optimal project using new user, obtain appraisal result.
The information content of scoring item is the item characteristic by calculating the new non-scoring item of user and the phase
Like user's cosine similarity acquisition of the item characteristic of scoring item.
Preferably, third acquisition module 13 may include:Acquiring unit, computing unit and judging unit, wherein:
Acquiring unit, the popularity for determining similar users scoring item;
Computing unit, the product for calculating popularity and information content;
Whether the product of judging unit, popularity and information content for judging to obtain is maximum, if it is, determining phase
Like user, scoring item is optimal project;If it is not, then returning to the product for executing and calculating popularity and information content.
The present embodiment by user's rating matrix factorization obtain respectively the user characteristics of new user, other users use
The item characteristic of family feature, the item characteristic of the non-scoring item of new user and other users scoring item;And then pass through calculating
Cosine similarity between the user characteristics and the user characteristics of the other users of the new user obtains similar users;And it uses
The popularity of scoring item and information content obtain optimal project in the similar users, and the optimal project is transferred to institute
New user is stated, to score the optimal project using the new user, obtains appraisal result;The wherein described item that scored
Purpose information content is item characteristic by calculating the new non-scoring item of user and the similar users scoring item
Item characteristic cosine similarity obtain.Compared with prior art, the present embodiment preferably predicts the preference letter of user
Breath, and then improve recommendation accuracy rate.
Referring to FIG. 6, it illustrates it is provided in an embodiment of the present invention it is a kind of based on matrix decomposition Active Learning scoring draw
Another structural schematic diagram of guiding systems can also include on the basis of Fig. 5:Acquisition module 10 and update module 14,
In:
Update module 14 for the user characteristics according to appraisal result update user, and returns to execution by calculating new use
Cosine similarity between the user characteristics at family and the user characteristics of other users obtains similar users, until cycle-index reaches pre-
If number.
It should be noted that preset times are specifically pre-set and new user interaction times.
Acquisition module 10, for obtaining user's rating matrix.
Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to by
One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation
Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning
Covering non-exclusive inclusion, so that the process, method, article or equipment including a series of elements includes not only that
A little elements, but also include other elements that are not explicitly listed, or further include for this process, method, article or
The intrinsic element of equipment.In the absence of more restrictions, the element limited by sentence " including one ... ", not
There is also other identical elements in the process, method, article or apparatus that includes the element for exclusion.
The foregoing description of the disclosed embodiments enables those skilled in the art to realize or use the present invention.To this
A variety of modifications of a little embodiments will be apparent for a person skilled in the art, and the general principles defined herein can
Without departing from the spirit or scope of the present invention, to realize in other embodiments.Therefore, the present invention will not be limited
It is formed on the embodiments shown herein, and is to fit to consistent with the principles and novel features disclosed in this article widest
Range.