Background technology
The Internet and mobile Internet universal, explosive growth and the people of quantity of information are obtaining information useful He interested
On promptness and contradiction between accuracy, need content (news) commending system building a kind of personalization to have to meet people
Obtain the information of daily demand pointedly.Personalized content recommendation system is currently mainly divided into two classes: content-based recommendation
(content-based recommendation, or referred to as information filtering, information filter) method, Collaborative Recommendation (or claim
For collaborative filtering, collaborative filter) method, and collaborative recommendation method includes based on memory (memory based)
With two class methods based on model (model based).
Content-based recommendation: mainly for information filtering, with the reading/non-reading histories of user as language material, training text is classified
Device, and use the grader obtained by training to predict user's fancy grade to new document (news), and make accordingly and be
The no decision-making that the document is recommended user.
Recommendation based on memory: the preference information (certain user has read certain news) of record user, portrays by these preferences
User, i.e. one user be represented as vector that its news read constituted (u=[and w1, w2 ..., wi ..., wn], i is literary composition
The numbering of shelves, wi is user to the preference value of every document-as seen is 1, has not seen being 0), to a new document d (and
Do not recommended to active user), use arest neighbors method to predict whether the document should be recommended active user and (have and currently use
To the preference of document d, other users of the immediate vector of vector at family, predict that active user is to document d's with them
Preference).
Recommendation based on model: unlike recommendation method based on memory, according to preference, all users are divided into some
In individual group, user is that the preference of the document is calculated by the user's group by its place to the preference of new document
's.
Collaborative Recommendation is widely used in e-commerce field.Owing to renewal speed and the quantity of news are the biggest, use Collaborative Recommendation
Method can cause a large amount of memory space consumption, and has every day a large amount of new data to introduce, by preference calculate the degree of association between document or
Preference modeling to user is required for substantial amounts of calculating, and common system is difficult to provide the biggest calculating and storage resource, therefore
, be widely used for the personalized recommendation of news category content is content-based recommendation method.Extensively should of big data platform
With making the personalized push utilizing Collaborative Recommendation to complete news be possibly realized, but this type of system is relatively wide (except new because of the scope of application
Outside news, such as video, audio frequency recommendation etc.) and generally have ignored the content of news completely.
Content-based recommendation to realize difficulty minimum, each user only uses a disaggregated model trained can portray it
Preference, therefore shared memory space is the most less.But its shortcoming is it is also obvious that the use that places one's entire reliance upon the assurance of user preference
The content of family history reading articles, and the interest preference in the preference of user, especially short-term, often have the strongest the most true
Qualitative, therefore content-based recommendation is easy to allow user have machine-made sensation, the most easily misses and user's history not phase
Like but the document that user is interested may be made.
Recommendation based on memory records every user preference information because of needs, and therefore the consumption to memory space is bigger, thus logical
Often can only retain closest data (as used threshold value to control), this can cause the method only to have the short-term interest of user relatively
Good carving effect, but well reflection can not be provided for the interest preference that user is long-term, thus affect recommendation effect.
Contrary with recommendation based on memory, recommendation based on model is to carve with the model of user's behavior in the past due to the preference of user
Draw, so generally can preferably reflect the Long-term Interest preference of user, but owing to the reaction of data is had certain stagnant by model
Rear property, bad to the assurance effect of user's short-term interest, also affect consequently recommended effect.
Meanwhile, the common drawback first-selection of collaborative recommendation method is:
For the application that customer volume is little, there is no any user and its new document being evaluated/reading cannot correctly be pushed away
Recommend;
Next to that, for the text category information of news one class, do not use the content information that can be conveniently processed, for
For the assurance of family interest preference, it it is a no small loss.
More efficient, result Chinese news commending system more accurately is searched for accordingly, it is desirable to provide a kind of.
Summary of the invention
For overcoming above-mentioned the deficiencies in the prior art, the present invention provides a kind of lightweight image searching method.
Realizing the solution that above-mentioned purpose used is:
A kind of Chinese news commending system, it thes improvement is that: described system includes for collecting data, record user preference
And update the learning layer of recommending module, for the data Layer of memory system data and for generating the recommendation layer of news recommendation list;
Described recommendation layer includes asking to return the candidate generator of recommendation news list and call preference module to push away described according to user
Recommend the collector that news list is ranked up.
Further, described learning layer includes recorder, Register, renovator and learner.
Further, described recorder is for writing news information in data Layer;Described Register is for safeguarding in data Layer
User profile;
Described renovator is used for obtaining user preference data, described user preference data is stored in described data Layer user preference and
User preference data according to obtaining calculates the article degree of association between news;
For train the learner of preference pattern include for train recommended models based on model cluster device and for training based on
The grader of the recommended models of content.
Further, described cluster device reads described user preference and described user profile, periodic according to described user preference
User is carried out clustering learning, the preference information of each article is updated by cluster result and each class respectively user's group and group preference
;
Described grader, periodically reads user profile and recommends history, and study forms disaggregated model, and by described disaggregated model
Update to user property.
Further, described data Layer includes the distributed data base storing the data of described commending system;
The field of described distributed data base include Item Information, user profile, user preference information, article degree of association information,
User's group, group preference, user property and recommendation history.
Further, described Item Information, for record every news include number, classify, originate, time, repetition secondary
Number and the information of content;
Described user profile, for recording the information including numbering of user;
Described user preference, for recording each user preference information to every article;
Described article degree of association, for depositing the news calculated according to user preference relevance degree between any two;
Described user group and described group of preference, be respectively used to deposit the user grouping obtained after clustering algorithm processes user preference data
Information and each group of preference to each piece article;
Described user property, for depositing the recommended models of each user based on content;
Described recommendation history, recommends the revision history of each user for record.
Further, after receiving the request of described collector, described candidate generator is chosen one from the news of system and is distinguished new
Hear candidate's news list of quality;
After receiving described candidate's news list, described collector is by calling collaborative filtering device based on memory, association based on model
Respectively described candidate's news list is carried out user preference score value prediction with filter and signal filter based on content, comprehensive after
Obtain user preference score value predictive value, will by described user preference score value predictive value sort after described candidate's news list export to
Front end is shown;
Further, described collaborative filtering device based on memory for capturing the short-term interest of user, according to user preference data and
The relevance predication user between the described news preference to every candidate article.
Further, described collaborative filtering device based on model is for capturing the Long-term Interest of user, according to described user's group and group
The preference of candidate article is predicted user's preference to candidate article by preference.
Further, disaggregated model that described signal filter based on content is acquired according to grader and news content, will wait
Selecting news category is user preference news and non-user preference news.
Compared with prior art, the method have the advantages that
1, fusion that the system of the present invention is feasible recommends the Chinese of methods new based on content, based on memory with based on 3 classes such as models
Hear commending system, on the basis of the defect avoiding above-mentioned three kinds of models, developed respective advantage, and do not increased system
System burden.
2, the system of the present invention had both ensured accurately portraying user's short-term interest and Long-term Interest, also wanted to make full use of news
Content improve the accuracy of recommendation and reduce the risk that new document can not accurately be recommended, and ensure in the feelings that customer volume is less
Under condition, it is possible to effectively complete the personalized recommendation of news.
3, compared with only using news commending system based on content, the short-term that the system of the present invention can hold user effectively is emerging
Interest, and make the content recommended the most rich and varied.
4, compared with the news commending system only using collaborative recommendation method, the system of the present invention can effectively solve because customer volume is less
The problem causing recommending inaccurate and new document effectively not recommended, also well make use of the information of news itself simultaneously,
Recommendation results is allowed to have more preferable interpretability.
5, the system of the present invention can hold the long and short phase interest of user simultaneously;And can solve well in the less situation of customer volume
Under personalized recommendation problem and new document accurately recommend problem.
6, the system of the present invention has the highest extensibility, can be increased sharply by the method reply customer volume and data volume that increase machine
Situation.
7, the system of the present invention has the strongest versatility, can facilitate to be applied to other guide (such as blog article, novel etc.)
Personalized recommendation.
Detailed description of the invention
Below in conjunction with the accompanying drawings the detailed description of the invention of the present invention is described in further detail.
The present invention provides a kind of based on content, based on memory and Chinese news content recommendation system based on three kinds of model recommendation method.
This system mainly includes collector and learner.
As it is shown in figure 1, Fig. 1 is that in the present embodiment, news recommends basic flow sheet;This Chinese news commending system mainly includes gathering
Storage and learner.
Collector, for processing the request that user sends.
Collector filters out a news list considering user preference according to the interest model of user from candidate's news list, will
It recommends user.
Learner, for generating the interest model of user.
Learner extracts the model that can correspondingly predict user to the fancy grade of new document from the browing record of user, i.e.
User preferences modeling.
The record of the one or more click event that learner produces in news reading process according to user preference data, i.e. user
, training generates new preference pattern, and property cycle of training of preference pattern is carried out.
Above-mentioned user preference data, specifically includes scoring or the data clicked on;The scoring of system of users preference includes situations below
:
1., explicitly scoring: if there being marking system, can mark to user according to practical situation, such as the score value given according to each situation
It is respectively 1~5;
2., implicit scores: if not having marking system, then user can be clicked on/accesses as 1 point, user does not clicks on/accesses
As 0 point.The concrete score value of above-mentioned scoring can be according to each default.
As in figure 2 it is shown, the flow chart of data processing figure that Fig. 2 is the present embodiment learning device;Learner includes clustering device and grader.
Cluster device, reads user preference information and user profile, periodically initiates, by user preference, user is carried out Clustering
Learning process, and by cluster result and each class, the preference information of each article updated respectively user's group and group preference.
For processing large-scale user preference information, clustering algorithm uses distributed programmed model (such as MapReduce) to realize
, in hadoop ecosphere existing machine learning bag Mahout of based on MapReduce be can be used directly and completes.
Grader, periodically reads user profile and the reading histories of each user, and reading histories includes user preference, recommendation
History and document information, start the process of Study strategies and methods, the grader succeeded in school updated user property.
Similar with cluster, also use distributed model (such as machine learning bag Mahout based on MapReduce) to realize.
Learning process includes:
(1), front end obtain user read event every time, the news i.e. clicked on, as new user preference data be sent to update
Module.
(2) news (one or more clicks on logout, is preference data) that, new user is clicked on by more new module is filled out
Enter the user preference in data base;
After being stored in user preference, calculate recommended models based on memory according to the data updated and be stored in the article degree of association of data base
In, this model form is the matrix of dependency between document and document, and this model is for capturing the short-term interest of user.
In the present embodiment, providing a kind of simplest Model Calculating Method, the method includes:
Assume that user A has read news B, then all news C1 user A read, C2 ..., Cn, separately constitute with B
To (C1, B), (C2, B) ..., (Cn, B), then in model (this model is the matrix that all news is constituted with news)
These are added 1 to the weight of corresponding position.
According to system requirements, can there be other strategies, including reducing matrix size, temporally weights be carried out decay etc..
(3), learner periodically obtain user preference data, training generates new preference pattern, is respectively as follows:
The cluster device of learner, for training recommended models based on model;
The grader of learner, is used for training content-based recommendation model, and this recommended models is a disaggregated model.
Cluster device uses clustering algorithm, such as k-means, figure cluster, spectral clustering, fuzzy k-means, hierarchical clustering, theme
Model tying etc., are divided into some heaps by a heap data according to the distance between data.
Grader uses sorting algorithm, such as arest neighbors, naive Bayesian, svm, decision tree etc., according to preprepared data
Learn disaggregated model (this step should belong to learning layer), then use disaggregated model new data to be classified (this step
Recommendation layer should be belonged to), i.e. grader is responsible for learning disaggregated model, it is recommended that the signal filter based on content in layer uses and divides
New data is classified by class model.
(4), the preference pattern that will newly train of learner, i.e. based on model recommended models and content-based recommendation model,
Being stored in the data base of correspondence, recommended models based on model is stored in user's group and group preference, and content-based recommendation model is stored in use
Family attribute.
As it is shown on figure 3, the flow chart of data processing figure of collector in Fig. 3 the present embodiment, it is recommended that process specifically includes that
(1), front end receiver to user ask after, by it to collector.
(2), collector sends the requests to candidate generator.
(3), candidate generator is according to user profile (such as user id) and other information (such as news quality, weight of website etc.)
The list of candidate's news is generated for this user.
(4), collector calls collaborative filtering device, collaborative filtering device based on model and letter based on content of based on memory respectively
Breath filter, it was predicted that three preference value obtained are combined by user by the preference of news in candidate's news list, collector
Ranked candidate news in descending order after conjunction, sends it to front end.
The present embodiment additionally provides the commending system merging collaborative recommendation method.Merge collaborative recommendation method and improve expanding of system
Malleability, instance system uses Distributed Computing Platform as its basic storage and to calculate platform.
As shown in Figure 4, based on memory, model, the commending system structural representation of recommended models of content during Fig. 4 is this enforcement
;This commending system specifically includes: learning layer, data Layer and recommendation layer.
One, data (data) layer
Data Layer, for storing Various types of data required in commending system, is realized by distributed memory system.Data storage
Field specifically includes that
1., Item Information (item information), for recording the relevant information of every news, including news numbering ID,
Classification, Data Source, time, number of repetition, content etc..
2., user profile (user information), for recording the relevant information of user, including Customs Assigned Number UID etc..
3., user preference (user preference), the preference of every article (is marked or is clicked on for recording each user
) information, including click time, evaluation time etc., it is achieved above can use sparse matrix.
4., article degree of association (item similarity), as supporting the core data of collaborative filtering device based on memory, should
Field is for depositing the news calculated according to user preference relevance degree between any two.
5., user's group (user group), the user grouping that obtains after depositing clustering algorithm process user preference data
Information, including group # GID etc.;
6., organize preference (group preference), divide for depositing the user obtained after clustering algorithm processes user preference data
Group information and each group of preference to each piece article;
Above-mentioned user's group and group preference are to support the core data of collaborative filtering device based on model.
7., user property (user profile), as support signal filter based on content core data, be used for depositing
The recommended models (usually text classifier) of each user based on content.
8., recommend history (recommend history), recommend the history of the document of each user for record, specifically wrap
Include news that user likes/clicks on, that do not like/do not click on.
Two, study (learn) layer
Learning layer, is used for collecting Various types of data, record user preference and updating recommending module, by using distributed memory system
And MapReduce program realizes.Learning layer mainly includes with lower module:
1., recorder (recorder), the relevant information of every news is write in Item Information.
2., Register (register), be responsible for safeguarding (additions and deletions etc.) user profile, be deposited into user profile.
3., renovator (updater), be responsible for data Layer and obtain user preference information, and preference information is stored in user preference,
To be responsible for calculating and update the article degree of association between (real-time or semireal time) news according to new user preference simultaneously.
4., cluster device (clusterer), periodically initiate by preference, user to be clustered (as being interval by half a day or sky)
The learning process of (packet), and by cluster result and each class, the preference information of each article updated respectively user's group and group preference
, need to read user preference information and user profile.
For processing large-scale user preference information, it is real that clustering algorithm should use distributed programmed model (such as MapReduce)
Existing, in hadoop ecosphere existing machine learning bag Mahout of based on MapReduce be can be used directly here and complete.
5., grader (classifier), the reading histories periodically reading each user (need to read user preference, recommendation
History, user profile and document information), start the process of Study strategies and methods, and update user property with the grader succeeded in school.
Similar with cluster, distributed model (MapReduce) also should be used to realize (utilizing Mahout bag).
Three, (recommend) layer is recommended
Recommend layer, for best suiting the top-quality of its preference according to for each read request output that request is each user
News.Layer is recommended to be built by the client using distributed memory system.Layer is recommended mainly to include module:
1., candidate generator (candidates generator), be responsible for after receiving the request that collector sends, from system
News in choose a candidate list.This candidate list does not have personalization features, and main purpose is to discriminate between the quality of news
(temporally, source, length, number of repetition, weight of website etc.).
2., collector (ensembler), be responsible for response user request.After the request receiving user and reading news, from time
Select maker to obtain candidate's news list, then use collaborative filtering device, collaborative filtering device based on model and base of based on memory
Signal filter in content carries out user preference score value prediction to every article in recommendation list respectively, three points will obtained
Value is weighted after summation as the final preference predictive value of every article, and the candidate's news list after sorting by predictive value will be defeated
Go out to front end, front end show personalized recommendation result.
Collector is also responsible for recorded recommendation list recommendation history, in order to grader study content-based recommendation model.
The form of three recommending module database client programs the most in a distributed manner that collector is called is set up, and is respectively as follows:
Collaborative filtering device (memory-based cf) based on memory: according to the relevance predication between user preference data and news
User's preference to every candidate article.
This filter is mainly used in capturing the short-term interest of user, and requirement of real-time is higher, therefore can carry out degree of association between news
Caching, caching implementation can use the memory databases such as redis.
Collaborative filtering device (model-based cf) based on model: according to inclined to candidate article of the grouping information of user and packet
Predicting user's preference to candidate article well, this recommending module is for capturing the Long-term Interest of user.
Signal filter based on content (content-based if): according to the interest classification mould of this user that grader is acquired
Type and the content of news, classify to candidate's news, if for the news of user preference.
Finally should be noted that: above example is merely to illustrate the technical scheme of the application rather than the restriction to its protection domain,
Although being described in detail the application with reference to above-described embodiment, those of ordinary skill in the field are it is understood that this area
Technical staff still can carry out all changes, amendment or equivalent to the detailed description of the invention of application after reading the application, but
These changes, amendment or equivalent, all within the claims that application is awaited the reply.