Background technique
Internet and mobile Internet it is universal, the explosive growth of information content and people obtain it is useful and interested
Information on timeliness and accuracy between contradiction, need to construct a kind of content (news) recommender system of personalization to meet
People targetedly obtain the information of daily demand.Personalized content recommendation system is broadly divided into two classes at present: based on interior
Recommendation (content-based recommendation, or be information filtering, information filter) method of appearance,
Collaborative Recommendation (or being collaborative filtering, collaborative filter) method, and collaborative recommendation method includes based on note
Recall (memory based) and based on model (model based) two class methods.
Content-based recommendation: mainly for information filtering, using the reading of user/non-reading histories as corpus, training text
This classifier, and predicted using the obtained classifier of training user to the fancy grade of new document (news), and accordingly
It is made whether to recommend the document into the decision of user.
Recommendation based on memory: the preference information (some user has read certain news) of user is recorded, with these preferences
Portray user, i.e. a user be represented as vector that the news that it was read constituted (u=[w1, w2 ...,
Wi ..., wn], i is the number of document, and wi is that user has such as seen the preference value-of every document to be 1, has not been seen as 0), to one
The new document d (not recommending to active user) of a piece, predicts whether the document should be recommended currently using arest neighbors method
User (has the other users with the immediate vector of vector of active user, is predicted with them the preference of document d
Preference of the active user to document d).
Recommendation based on model: unlike the recommended method based on memory, according to preference, all users are divided into
In several groups, user to the preference of new document is counted to the preference of the document by the user group where it
It obtains.
Collaborative Recommendation is widely used in e-commerce field.Since the renewal speed and quantity of news are all very big, association is used
It will cause a large amount of memory space consumption with recommended method, and there are a large amount of new datas to introduce daily, by between preference calculating document
The degree of correlation requires largely to calculate to the preference modeling of user, and common system is difficult to provide so big calculating and storage
Resource, therefore, what is be widely used for the personalized recommendation of news category content is content-based recommendation method.Big data is flat
The extensive use of platform makes it possible the personalized push that news is completed using Collaborative Recommendation, but such system is because being applicable in model
Enclose the content for relatively extensively usually having ignored news completely (in addition to news, such as the recommendation of video, audio).
The realization difficulty of content-based recommendation is minimum, and a trained disaggregated model is used only to each user
Its preference is portrayed, therefore occupied memory space is also less.But its disadvantage it is also obvious that the assurance of user preference completely according to
Rely in the content of user's history reading article, and the interest preference in the preference of user, especially short-term, often have very strong
Uncertainty, therefore content-based recommendation is easy to that user is allowed to have machine-made feeling, is also easy to miss and go through with user
History is simultaneously dissimilar but may enable the interested document of user.
Based on the recommendation of memory because needing to record every user preference information, thus it is bigger to the consumption of memory space,
So would generally only retain closest data (such as controlling using threshold value), this will lead to this method only to the short-term emerging of user
Interest has preferable carving effect, but the interest preference long-term for user cannot provide good reflection, to influence to push away
Recommend effect.
With the recommendation based on memory on the contrary, based on the recommendation of model since the preference of user is the mould for going over behavior with user
Type is portrayed, so usually can preferably reflect the Long-term Interest preference of user, but since model has the reaction of data
Certain hysteresis quality, it is bad to the assurance effect of user's short-term interest, also influence consequently recommended effect.
Meanwhile the common drawback first choice of collaborative recommendation method is:
For the application seldom for user volume, the new document that there is no any user evaluate/read to it can not be obtained
Recommend to correct;
Followed by, right not using the content information that can be facilitated processing for the text category information of news one kind
It is a no small loss for the assurance of user interest preference.
Accordingly, it is desirable to provide a kind of more efficient, the more accurate Chinese news recommender system of result of search.
Summary of the invention
To overcome above-mentioned the deficiencies in the prior art, the present invention provides a kind of lightweight image searching method.
Realize solution used by above-mentioned purpose are as follows:
A kind of Chinese news recommender system, thes improvement is that: the system comprises for collecting data, record user
Preference simultaneously updates the learning layer of recommending module, for the data Layer of memory system data and for generating pushing away for news recommendation list
Recommend layer;
It is described that layer is recommended to include the candidate generator according to user's request return recommendation news list and call preference module
The collector that the recommendation news list is ranked up.
Further, the learning layer includes logger, Register, renovator and learner.
Further, the logger in data Layer for being written news information;The Register is used in data Layer
Middle maintenance user information;
The user preference data is stored in the user of the data Layer for obtaining user preference data by the renovator
Preference and the article degree of correlation between news is calculated according to the user preference data of acquisition;
Learner for training preference pattern includes the cluster device for recommended models of the training based on model and is used for
The classifier of training content-based recommendation model.
Further, the cluster device reads the user preference and the user information, according to the user preference week
Phase property carries out clustering learning to user, and cluster result and each class update respectively to user group the preference information of each article
With a group preference;
The classifier, periodic reading user information and recommends history, and study forms disaggregated model, and by the classification
Model modification is to user property.
Further, the data Layer includes the distributed data base for storing the data of the recommender system;
The field of the distributed data base includes Item Information, user information, user preference information, article degree of correlation letter
Breath, user group, group preference, user property and recommendation history.
Further, the Item Information includes number, classification, source, time, repetition for record every news
The information of number and content;
The user information includes the information numbered for record user;
The user preference, for recording each user to the preference information of every article;
The article degree of correlation, for storing the relevance degree of the news calculated according to user preference between any two;
The user group and described group of preference are respectively used to the use obtained after storage clustering algorithm processing user preference data
Family grouping information and each group of preference to each piece article;
The user property, for storing the recommended models of each user based on content;
The recommendation history, for recording the revision history for recommending each user.
Further, after the request for receiving the collector, the candidate generator chooses one from the news of system
Distinguish the candidate news list of news quality;
After receiving the candidate news list, the collector is by calling collaborative filtering device based on memory, being based on mould
It is pre- that the collaborative filtering device of type and signal filter based on content carry out user preference score value to the candidate news list respectively
It surveys, obtains user preference score value predicted value after comprehensive, it will be by described candidate new after user preference score value predicted value sequence
News list is exported to be shown to front end;
Further, the collaborative filtering device based on memory is used to capture the short-term interest of user, according to user preference
The preference of relevance predication user between data and the news to every candidate article.
Further, the collaborative filtering device based on model is used to capture the Long-term Interest of user, according to the user
Group and group preference predict the preference of candidate article user to the preference of candidate article.
Further, in the disaggregated model and news acquired based on the signal filter of content according to classifier
Hold, is user preference news and non-user preference news by candidate news category.
Compared with prior art, the invention has the following advantages:
1, system of the invention is feasible has merged based on content, based on memory and based on the 3 class recommended method such as model
Chinese news recommender system has developed respective advantage, and not on the basis of avoiding the defect of above-mentioned three kinds of models
Increase system burden.
2, system of the invention had both guaranteed to portray user's short-term interest and the accurate of Long-term Interest, and also wanting can be sufficiently sharp
The accuracy of recommendation is improved with the content of news and reduces the risk that new document cannot accurately be recommended, and is guaranteed in user volume
In lesser situation, the personalized recommendation of news can be effectively completed.
3, compared with the news recommender system based on content is used only, system of the invention can effectively hold user's
Short-term interest, and make the content recommended more rich and varied.
4, compared with the news recommender system that collaborative recommendation method is used only, system of the invention can effectively be solved because of user
Less the problem of causing recommendation inaccuracy and new document that cannot effectively be recommended is measured, while news itself is also well utilized
Information, allow recommendation results have better interpretation.
5, system of the invention can hold the long and short phase interest of user simultaneously;And can be well solved user volume compared with
The accurate recommendation problem of personalized recommendation problem and new document in small situation.
6, system of the invention has very high scalability, can cope with user volume and data by increasing the method for machine
The case where amount is increased sharply.
7, system of the invention has very strong versatility, can be convenient to be applied to other content (such as blog article, novel
Deng) personalized recommendation.
Specific embodiment
A specific embodiment of the invention is described in further detail with reference to the accompanying drawing.
The present invention provides a kind of to be pushed away based on content, based on memory and Chinese news content based on three kinds of recommended methods of model
Recommend system.The system mainly includes collector and learner.
As shown in FIG. 1, FIG. 1 is news in the present embodiment to recommend basic flow chart;The Chinese news recommender system is mainly wrapped
Include collector and learner.
Collector, for handling the request of user's transmission.
Collector filters out the news of a consideration user preference according to the interest model of user from candidate news list
User is recommended in list.
Learner, for generating the interest model of user.
Learner extracts the fancy grade that can correspondingly predict user to new document from the browing record of user
Model, i.e. user preferences modeling.
Learner is according to user preference data, i.e. the one or more click event that generates in news reading process of user
Record, training generates new preference pattern, and property cycle of training of preference pattern carries out.
Above-mentioned user preference data specifically includes the data of scoring or click;System to the scoring of user preference include with
Lower situation:
1., explicitly score: if there is points-scoring system, user's scoring can be given according to the actual situation, as according to each situation given
Score value is respectively 1~5;
2., implicit scores: if without points-scoring system, user can be clicked/be accessed as 1 point, user does not click on/visits
It asks as 0 point.The above-mentioned specific score value of scoring can be according to each default.
As shown in Fig. 2, Fig. 2 is the flow chart of data processing figure of learner in the present embodiment;Learner includes cluster device and divides
Class device.
Device is clustered, user preference information and user information are read, periodically initiates to cluster user by user preference
The learning process of grouping, and user group and group preference are updated respectively to the preference information of each article with cluster result and each class.
To handle large-scale user preference information, clustering algorithm using distributed programmed model (such as MapReduce) come
It realizes, the existing machine learning packet Mahout based on MapReduce can be used directly in the hadoop ecosphere to complete.
Classifier, periodically reads the user information and reading histories of each user, reading histories include user preference,
Recommend history and document information, start the process of Study strategies and methods, the classifier succeeded in school is updated to user property.
It is similar with cluster, also use distributed model (such as machine learning packet Mahout based on MapReduce) Lai Shixian.
Learning process includes:
(1), front end obtains user and reads event every time, that is, the news clicked is sent to as new user preference data
Update module.
(2), the news that update module clicks new user (one or more clicks logout, as preference data)
Insert the user preference in database;
After being stored in user preference, the article of the deposit database of the recommended models based on memory is calculated according to the data of update
In the degree of correlation, the matrix of model form correlation between document and document, the model is used to capture the short-term interest of user.
In the present embodiment, a kind of simplest Model Calculating Method is provided, this method comprises:
Assuming that user A has read news B, then all news C1, C2 ..., the Cn read user A are separately constituted with B
To (C1, B), (C2, B) ..., (Cn, B), then in model (model is the matrix that all news and news are constituted)
These are added 1 to the weight at corresponding position.
According to system requirements, there can be other strategies, including reduce matrix size, temporally decay etc. to weight.
(3), learner periodically obtains user preference data, and training generates new preference pattern, is respectively as follows:
The cluster device of learner, for recommended models of the training based on model;
The classifier of learner, for training content-based recommendation model, which is a disaggregated model.
It clusters device and uses clustering algorithm, such as k-means, figure cluster, spectral clustering, fuzzy k-means, hierarchical clustering, master
Model tying etc. is inscribed, a heap data is divided into several heaps according to the distance between data.
Classifier uses sorting algorithm, such as arest neighbors, naive Bayesian, svm, decision tree, according to preprepared
Data learn disaggregated model (this step should belong to learning layer) out, are then classified using disaggregated model to new data
(this step, which should belong to, recommends layer), i.e., classifier is responsible for learning disaggregated model out, recommends the information mistake based on content in layer
Filter classifies to new data using disaggregated model.
(4), learner will new trained preference pattern, the i.e. recommended models based on model and content-based recommendation mould
Type is stored in corresponding database, and recommended models deposit user group and group preference, content-based recommendation model based on model are deposited
Access customer attribute.
As shown in figure 3, in Fig. 3 the present embodiment collector flow chart of data processing figure, recommendation process specifically includes that
(1), it after front end receiver to user's request, is sent out to collector.
(2), collector sends the requests to candidate generator.
(3), candidate generator is according to user information (such as user id) and other information (such as news quality, weight of website)
The list of a candidate news is generated for the user.
(4), collector calls the collaborative filtering device based on memory, the collaborative filtering device based on model and respectively based on content
Signal filter, predict preference of the user to news in candidate news list, three preference values of the collector to acquisition
Carry out it is comprehensive after ranked candidate news in descending order, send it to front end.
The recommender system of fusion collaborative recommendation method is additionally provided in the present embodiment.It merges collaborative recommendation method and improves system
Scalability, instance system is using the Distributed Computing Platform storage and computing platform basic as its.
As shown in figure 4, Fig. 4 be this implementation in based on memory, model, content recommended models recommender system structural representation
Figure;The recommender system specifically includes: learning layer, data Layer and recommendation layer.
One, data (data) layer
Data Layer is realized for storing Various types of data needed for recommender system by distributed memory system.Data are deposited
The field of storage specifically includes that
1., Item Information (item information), for recording the relevant information of every news, including news number
ID, classification, data source, time, number of repetition, content etc..
2., user information (user information), for recording the relevant information of user, including Customs Assigned Number UID
Deng.
3., user preference (user preference), for record each user to the preference of every article (scoring or
Click) information, including click time, evaluation time etc., sparse matrix can be used in realization.
4., the article degree of correlation (item similarity), as support the collaborative filtering device based on memory core number
It is used to store the relevance degree of the news calculated according to user preference between any two according to, the field.
5., user group (user group), for storing obtained use after storage clustering algorithm processing user preference data
Family grouping information, including group # GID etc.;
6., group preference (group preference), for being obtained after storing clustering algorithm processing user preference data
User grouping information and each group of preference to each piece article;
Above-mentioned user group and group preference are to support the core data of the collaborative filtering device based on model.
7., user property (user profile), as support the signal filter based on content core data, be used for
Store the recommended models (usually text classifier) of each user based on content.
8., recommend history (recommend history), for record the history for recommending the document of each user, tool
The news not clicked on that body includes that user likes/clicking, do not like/.
Two, learn (learn) layer
Learning layer, for collecting Various types of data, record user preference and updating recommending module, by using distributed storage
System and MapReduce program are realized.Learning layer mainly comprises the following modules:
1., logger (recorder), will be in the relevant information write-in Item Information of every news.
2., Register (register), be responsible for maintenance (additions and deletions etc.) user information, be deposited into user information.
3., renovator (updater), be responsible for data Layer obtain user preference information, and by preference information deposit user it is inclined
It is good, while to be responsible for that the article degree of correlation between (in real time or semireal time) news is calculated and updated according to new user preference.
4., cluster device (clusterer), periodically (as being interval by half a day or day), initiation gathers user by preference
The learning process of class (grouping), and user group and group are updated respectively partially to the preference information of each article with cluster result and each class
It is good, it needs to read user preference information and user information.
To handle large-scale user preference information, clustering algorithm should use distributed programmed model (such as MapReduce)
It realizes, the existing machine learning packet Mahout based on MapReduce can be used directly in the hadoop ecosphere here come complete
At.
5., classifier (classifier), periodically read each user reading histories (need to read user preference,
Recommend history, user information and document information), start the process of Study strategies and methods, and updated and used with the classifier succeeded in school
Family attribute.
It is similar with cluster, it should also use distributed model (MapReduce) to realize and (utilize Mahout packet).
Three, recommend (recommend) layer
Recommend layer, for according to the quality for requesting to be best suitable for its preference for each read request output of each user
Best news.Layer is recommended to construct by using the client of distributed memory system.Recommending layer mainly includes module:
1., candidate generator (candidates generator), be responsible for receive collector sending request after, from
A candidate list is chosen in the news of system.The candidate list does not have personalization features, and main purpose is to discriminate between news
Quality (temporally, source, length, number of repetition, weight of website etc.).
2., collector (ensembler), be responsible for response user request.After the request for receiving user's reading news, from
Candidate generator obtains candidate news list, then uses the collaborative filtering device based on memory, the collaborative filtering device based on model
The prediction of user preference score value is carried out to every article in recommendation list respectively with the signal filter based on content, by acquisition
Three score values are weighted the preference predicted value final as every article after summation, and will by after predicted value sequence it is candidate newly
It hears list to export to front end, personalized recommendation result is shown by front end.
Collector is also responsible for that recommendation history for recommendation list is recorded, so that classifier learns content-based recommendation mould
Type.
Three recommending modules that collector is called establish in a distributed manner by the form of database client program, respectively
Are as follows:
Collaborative filtering device (memory-based cf) based on memory: according to related between user preference data and news
Preference of the degree prediction user to every candidate article.
The filter is mainly used for capturing the short-term interest of user, and requirement of real-time is higher, therefore can be between related news
Degree is cached, and the memory databases such as redis can be used in caching implementation.
Collaborative filtering device (model-based cf) based on model: according to the grouping information of user and grouping to candidate text
The preference of chapter predicts user to the preference of candidate article, which is used to capture the Long-term Interest of user.
Signal filter (content-based if) based on content: according to the interest for the user that classifier is acquired
The content of disaggregated model and news classifies to candidate news, if is the news of user preference.
Finally it should be noted that: above embodiments are merely to illustrate the technical solution of the application rather than to its protection scopes
Limitation, although the application is described in detail referring to above-described embodiment, those of ordinary skill in the art should
Understand: those skilled in the art read the specific embodiment of application can still be carried out after the application various changes, modification or
Person's equivalent replacement, but these changes, modification or equivalent replacement, are applying within pending claims.