CN105989056B

CN105989056B - A kind of Chinese news recommender system

Info

Publication number: CN105989056B
Application number: CN201510063902.4A
Authority: CN
Inventors: 赵毅强; 许欢庆; 郭永福; 陈沛
Original assignee: Beijing Zhongsou Cloud Business Network Technology Co Ltd
Current assignee: Beijing Zhongsou Cloud Business Network Technology Co ltd
Priority date: 2015-02-06
Filing date: 2015-02-06
Publication date: 2019-05-24
Anticipated expiration: 2035-02-06
Also published as: CN105989056A

Abstract

The present invention provides a kind of Chinese news recommender systems, and the system comprises the learning layers for collecting data, recording user preference and update recommending module, the recommendation layer for the data Layer of memory system data and for generating news recommendation list；The collector for recommending layer to include the candidate generator according to user's request return recommendation news list and preference module is called to be ranked up the recommendation news list.The system is feasible have been merged based on content, the Chinese news recommender system based on memory and based on model three classes recommended method, on the basis of avoiding the defect of above-mentioned three kinds of models, has developed respective advantage, and there is no increase system burden.

Description

A kind of Chinese news recommender system

Technical field

The present invention relates to a kind of system of internet area, in particular to a kind of Chinese news recommender system.

Background technique

Internet and mobile Internet it is universal, the explosive growth of information content and people obtain it is useful and interested Information on timeliness and accuracy between contradiction, need to construct a kind of content (news) recommender system of personalization to meet People targetedly obtain the information of daily demand.Personalized content recommendation system is broadly divided into two classes at present: based on interior Recommendation (content-based recommendation, or be information filtering, information filter) method of appearance, Collaborative Recommendation (or being collaborative filtering, collaborative filter) method, and collaborative recommendation method includes based on note Recall (memory based) and based on model (model based) two class methods.

Content-based recommendation: mainly for information filtering, using the reading of user/non-reading histories as corpus, training text This classifier, and predicted using the obtained classifier of training user to the fancy grade of new document (news), and accordingly It is made whether to recommend the document into the decision of user.

Recommendation based on memory: the preference information (some user has read certain news) of user is recorded, with these preferences Portray user, i.e. a user be represented as vector that the news that it was read constituted (u=[w1, w2 ..., Wi ..., wn], i is the number of document, and wi is that user has such as seen the preference value-of every document to be 1, has not been seen as 0), to one The new document d (not recommending to active user) of a piece, predicts whether the document should be recommended currently using arest neighbors method User (has the other users with the immediate vector of vector of active user, is predicted with them the preference of document d Preference of the active user to document d).

Recommendation based on model: unlike the recommended method based on memory, according to preference, all users are divided into In several groups, user to the preference of new document is counted to the preference of the document by the user group where it It obtains.

Collaborative Recommendation is widely used in e-commerce field.Since the renewal speed and quantity of news are all very big, association is used It will cause a large amount of memory space consumption with recommended method, and there are a large amount of new datas to introduce daily, by between preference calculating document The degree of correlation requires largely to calculate to the preference modeling of user, and common system is difficult to provide so big calculating and storage Resource, therefore, what is be widely used for the personalized recommendation of news category content is content-based recommendation method.Big data is flat The extensive use of platform makes it possible the personalized push that news is completed using Collaborative Recommendation, but such system is because being applicable in model Enclose the content for relatively extensively usually having ignored news completely (in addition to news, such as the recommendation of video, audio).

The realization difficulty of content-based recommendation is minimum, and a trained disaggregated model is used only to each user Its preference is portrayed, therefore occupied memory space is also less.But its disadvantage it is also obvious that the assurance of user preference completely according to Rely in the content of user's history reading article, and the interest preference in the preference of user, especially short-term, often have very strong Uncertainty, therefore content-based recommendation is easy to that user is allowed to have machine-made feeling, is also easy to miss and go through with user History is simultaneously dissimilar but may enable the interested document of user.

Based on the recommendation of memory because needing to record every user preference information, thus it is bigger to the consumption of memory space, So would generally only retain closest data (such as controlling using threshold value), this will lead to this method only to the short-term emerging of user Interest has preferable carving effect, but the interest preference long-term for user cannot provide good reflection, to influence to push away Recommend effect.

With the recommendation based on memory on the contrary, based on the recommendation of model since the preference of user is the mould for going over behavior with user Type is portrayed, so usually can preferably reflect the Long-term Interest preference of user, but since model has the reaction of data Certain hysteresis quality, it is bad to the assurance effect of user's short-term interest, also influence consequently recommended effect.

Meanwhile the common drawback first choice of collaborative recommendation method is:

For the application seldom for user volume, the new document that there is no any user evaluate/read to it can not be obtained Recommend to correct；

Followed by, right not using the content information that can be facilitated processing for the text category information of news one kind It is a no small loss for the assurance of user interest preference.

Accordingly, it is desirable to provide a kind of more efficient, the more accurate Chinese news recommender system of result of search.

Summary of the invention

To overcome above-mentioned the deficiencies in the prior art, the present invention provides a kind of lightweight image searching method.

Realize solution used by above-mentioned purpose are as follows:

A kind of Chinese news recommender system, thes improvement is that: the system comprises for collecting data, record user Preference simultaneously updates the learning layer of recommending module, for the data Layer of memory system data and for generating pushing away for news recommendation list Recommend layer；

It is described that layer is recommended to include the candidate generator according to user's request return recommendation news list and call preference module The collector that the recommendation news list is ranked up.

Further, the learning layer includes logger, Register, renovator and learner.

Further, the logger in data Layer for being written news information；The Register is used in data Layer Middle maintenance user information；

The user preference data is stored in the user of the data Layer for obtaining user preference data by the renovator Preference and the article degree of correlation between news is calculated according to the user preference data of acquisition；

Learner for training preference pattern includes the cluster device for recommended models of the training based on model and is used for The classifier of training content-based recommendation model.

Further, the cluster device reads the user preference and the user information, according to the user preference week Phase property carries out clustering learning to user, and cluster result and each class update respectively to user group the preference information of each article With a group preference；

The classifier, periodic reading user information and recommends history, and study forms disaggregated model, and by the classification Model modification is to user property.

Further, the data Layer includes the distributed data base for storing the data of the recommender system；

The field of the distributed data base includes Item Information, user information, user preference information, article degree of correlation letter Breath, user group, group preference, user property and recommendation history.

Further, the Item Information includes number, classification, source, time, repetition for record every news The information of number and content；

The user information includes the information numbered for record user；

The user preference, for recording each user to the preference information of every article；

The article degree of correlation, for storing the relevance degree of the news calculated according to user preference between any two；

The user group and described group of preference are respectively used to the use obtained after storage clustering algorithm processing user preference data Family grouping information and each group of preference to each piece article；

The user property, for storing the recommended models of each user based on content；

The recommendation history, for recording the revision history for recommending each user.

Further, after the request for receiving the collector, the candidate generator chooses one from the news of system Distinguish the candidate news list of news quality；

After receiving the candidate news list, the collector is by calling collaborative filtering device based on memory, being based on mould It is pre- that the collaborative filtering device of type and signal filter based on content carry out user preference score value to the candidate news list respectively It surveys, obtains user preference score value predicted value after comprehensive, it will be by described candidate new after user preference score value predicted value sequence News list is exported to be shown to front end；

Further, the collaborative filtering device based on memory is used to capture the short-term interest of user, according to user preference The preference of relevance predication user between data and the news to every candidate article.

Further, the collaborative filtering device based on model is used to capture the Long-term Interest of user, according to the user Group and group preference predict the preference of candidate article user to the preference of candidate article.

Further, in the disaggregated model and news acquired based on the signal filter of content according to classifier Hold, is user preference news and non-user preference news by candidate news category.

Compared with prior art, the invention has the following advantages:

1, system of the invention is feasible has merged based on content, based on memory and based on the 3 class recommended method such as model Chinese news recommender system has developed respective advantage, and not on the basis of avoiding the defect of above-mentioned three kinds of models Increase system burden.

2, system of the invention had both guaranteed to portray user's short-term interest and the accurate of Long-term Interest, and also wanting can be sufficiently sharp The accuracy of recommendation is improved with the content of news and reduces the risk that new document cannot accurately be recommended, and is guaranteed in user volume In lesser situation, the personalized recommendation of news can be effectively completed.

3, compared with the news recommender system based on content is used only, system of the invention can effectively hold user's Short-term interest, and make the content recommended more rich and varied.

4, compared with the news recommender system that collaborative recommendation method is used only, system of the invention can effectively be solved because of user Less the problem of causing recommendation inaccuracy and new document that cannot effectively be recommended is measured, while news itself is also well utilized Information, allow recommendation results have better interpretation.

5, system of the invention can hold the long and short phase interest of user simultaneously；And can be well solved user volume compared with The accurate recommendation problem of personalized recommendation problem and new document in small situation.

6, system of the invention has very high scalability, can cope with user volume and data by increasing the method for machine The case where amount is increased sharply.

7, system of the invention has very strong versatility, can be convenient to be applied to other content (such as blog article, novel Deng) personalized recommendation.

Detailed description of the invention

Fig. 1 is that news recommends basic flow chart in the present embodiment；

Fig. 2 is the flow chart of data processing figure of learner in the present embodiment；

Fig. 3 is the flow chart of data processing figure of collector in the present embodiment

Fig. 4 be this implementation in based on memory, model, commending contents module recommender system structural schematic diagram.

Specific embodiment

A specific embodiment of the invention is described in further detail with reference to the accompanying drawing.

The present invention provides a kind of to be pushed away based on content, based on memory and Chinese news content based on three kinds of recommended methods of model Recommend system.The system mainly includes collector and learner.

As shown in FIG. 1, FIG. 1 is news in the present embodiment to recommend basic flow chart；The Chinese news recommender system is mainly wrapped Include collector and learner.

Collector, for handling the request of user's transmission.

Collector filters out the news of a consideration user preference according to the interest model of user from candidate news list User is recommended in list.

Learner, for generating the interest model of user.

Learner extracts the fancy grade that can correspondingly predict user to new document from the browing record of user Model, i.e. user preferences modeling.

Learner is according to user preference data, i.e. the one or more click event that generates in news reading process of user Record, training generates new preference pattern, and property cycle of training of preference pattern carries out.

Above-mentioned user preference data specifically includes the data of scoring or click；System to the scoring of user preference include with Lower situation:

1., explicitly score: if there is points-scoring system, user's scoring can be given according to the actual situation, as according to each situation given Score value is respectively 1~5；

2., implicit scores: if without points-scoring system, user can be clicked/be accessed as 1 point, user does not click on/visits It asks as 0 point.The above-mentioned specific score value of scoring can be according to each default.

As shown in Fig. 2, Fig. 2 is the flow chart of data processing figure of learner in the present embodiment；Learner includes cluster device and divides Class device.

Device is clustered, user preference information and user information are read, periodically initiates to cluster user by user preference The learning process of grouping, and user group and group preference are updated respectively to the preference information of each article with cluster result and each class.

To handle large-scale user preference information, clustering algorithm using distributed programmed model (such as MapReduce) come It realizes, the existing machine learning packet Mahout based on MapReduce can be used directly in the hadoop ecosphere to complete.

Classifier, periodically reads the user information and reading histories of each user, reading histories include user preference, Recommend history and document information, start the process of Study strategies and methods, the classifier succeeded in school is updated to user property.

It is similar with cluster, also use distributed model (such as machine learning packet Mahout based on MapReduce) Lai Shixian.

Learning process includes:

(1), front end obtains user and reads event every time, that is, the news clicked is sent to as new user preference data Update module.

(2), the news that update module clicks new user (one or more clicks logout, as preference data) Insert the user preference in database；

After being stored in user preference, the article of the deposit database of the recommended models based on memory is calculated according to the data of update In the degree of correlation, the matrix of model form correlation between document and document, the model is used to capture the short-term interest of user.

In the present embodiment, a kind of simplest Model Calculating Method is provided, this method comprises:

Assuming that user A has read news B, then all news C1, C2 ..., the Cn read user A are separately constituted with B To (C1, B), (C2, B) ..., (Cn, B), then in model (model is the matrix that all news and news are constituted) These are added 1 to the weight at corresponding position.

According to system requirements, there can be other strategies, including reduce matrix size, temporally decay etc. to weight.

(3), learner periodically obtains user preference data, and training generates new preference pattern, is respectively as follows:

The cluster device of learner, for recommended models of the training based on model；

The classifier of learner, for training content-based recommendation model, which is a disaggregated model.

It clusters device and uses clustering algorithm, such as k-means, figure cluster, spectral clustering, fuzzy k-means, hierarchical clustering, master Model tying etc. is inscribed, a heap data is divided into several heaps according to the distance between data.

Classifier uses sorting algorithm, such as arest neighbors, naive Bayesian, svm, decision tree, according to preprepared Data learn disaggregated model (this step should belong to learning layer) out, are then classified using disaggregated model to new data (this step, which should belong to, recommends layer), i.e., classifier is responsible for learning disaggregated model out, recommends the information mistake based on content in layer Filter classifies to new data using disaggregated model.

(4), learner will new trained preference pattern, the i.e. recommended models based on model and content-based recommendation mould Type is stored in corresponding database, and recommended models deposit user group and group preference, content-based recommendation model based on model are deposited Access customer attribute.

As shown in figure 3, in Fig. 3 the present embodiment collector flow chart of data processing figure, recommendation process specifically includes that

(1), it after front end receiver to user's request, is sent out to collector.

(2), collector sends the requests to candidate generator.

(3), candidate generator is according to user information (such as user id) and other information (such as news quality, weight of website) The list of a candidate news is generated for the user.

(4), collector calls the collaborative filtering device based on memory, the collaborative filtering device based on model and respectively based on content Signal filter, predict preference of the user to news in candidate news list, three preference values of the collector to acquisition Carry out it is comprehensive after ranked candidate news in descending order, send it to front end.

The recommender system of fusion collaborative recommendation method is additionally provided in the present embodiment.It merges collaborative recommendation method and improves system Scalability, instance system is using the Distributed Computing Platform storage and computing platform basic as its.

As shown in figure 4, Fig. 4 be this implementation in based on memory, model, content recommended models recommender system structural representation Figure；The recommender system specifically includes: learning layer, data Layer and recommendation layer.

One, data (data) layer

Data Layer is realized for storing Various types of data needed for recommender system by distributed memory system.Data are deposited The field of storage specifically includes that

1., Item Information (item information), for recording the relevant information of every news, including news number ID, classification, data source, time, number of repetition, content etc..

2., user information (user information), for recording the relevant information of user, including Customs Assigned Number UID Deng.

3., user preference (user preference), for record each user to the preference of every article (scoring or Click) information, including click time, evaluation time etc., sparse matrix can be used in realization.

4., the article degree of correlation (item similarity), as support the collaborative filtering device based on memory core number It is used to store the relevance degree of the news calculated according to user preference between any two according to, the field.

5., user group (user group), for storing obtained use after storage clustering algorithm processing user preference data Family grouping information, including group # GID etc.；

6., group preference (group preference), for being obtained after storing clustering algorithm processing user preference data User grouping information and each group of preference to each piece article；

Above-mentioned user group and group preference are to support the core data of the collaborative filtering device based on model.

7., user property (user profile), as support the signal filter based on content core data, be used for Store the recommended models (usually text classifier) of each user based on content.

8., recommend history (recommend history), for record the history for recommending the document of each user, tool The news not clicked on that body includes that user likes/clicking, do not like/.

Two, learn (learn) layer

Learning layer, for collecting Various types of data, record user preference and updating recommending module, by using distributed storage System and MapReduce program are realized.Learning layer mainly comprises the following modules:

1., logger (recorder), will be in the relevant information write-in Item Information of every news.

2., Register (register), be responsible for maintenance (additions and deletions etc.) user information, be deposited into user information.

3., renovator (updater), be responsible for data Layer obtain user preference information, and by preference information deposit user it is inclined It is good, while to be responsible for that the article degree of correlation between (in real time or semireal time) news is calculated and updated according to new user preference.

4., cluster device (clusterer), periodically (as being interval by half a day or day), initiation gathers user by preference The learning process of class (grouping), and user group and group are updated respectively partially to the preference information of each article with cluster result and each class It is good, it needs to read user preference information and user information.

To handle large-scale user preference information, clustering algorithm should use distributed programmed model (such as MapReduce) It realizes, the existing machine learning packet Mahout based on MapReduce can be used directly in the hadoop ecosphere here come complete At.

5., classifier (classifier), periodically read each user reading histories (need to read user preference, Recommend history, user information and document information), start the process of Study strategies and methods, and updated and used with the classifier succeeded in school Family attribute.

It is similar with cluster, it should also use distributed model (MapReduce) to realize and (utilize Mahout packet).

Three, recommend (recommend) layer

Recommend layer, for according to the quality for requesting to be best suitable for its preference for each read request output of each user Best news.Layer is recommended to construct by using the client of distributed memory system.Recommending layer mainly includes module:

1., candidate generator (candidates generator), be responsible for receive collector sending request after, from A candidate list is chosen in the news of system.The candidate list does not have personalization features, and main purpose is to discriminate between news Quality (temporally, source, length, number of repetition, weight of website etc.).

2., collector (ensembler), be responsible for response user request.After the request for receiving user's reading news, from Candidate generator obtains candidate news list, then uses the collaborative filtering device based on memory, the collaborative filtering device based on model The prediction of user preference score value is carried out to every article in recommendation list respectively with the signal filter based on content, by acquisition Three score values are weighted the preference predicted value final as every article after summation, and will by after predicted value sequence it is candidate newly It hears list to export to front end, personalized recommendation result is shown by front end.

Collector is also responsible for that recommendation history for recommendation list is recorded, so that classifier learns content-based recommendation mould Type.

Three recommending modules that collector is called establish in a distributed manner by the form of database client program, respectively Are as follows:

Collaborative filtering device (memory-based cf) based on memory: according to related between user preference data and news Preference of the degree prediction user to every candidate article.

The filter is mainly used for capturing the short-term interest of user, and requirement of real-time is higher, therefore can be between related news Degree is cached, and the memory databases such as redis can be used in caching implementation.

Collaborative filtering device (model-based cf) based on model: according to the grouping information of user and grouping to candidate text The preference of chapter predicts user to the preference of candidate article, which is used to capture the Long-term Interest of user.

Signal filter (content-based if) based on content: according to the interest for the user that classifier is acquired The content of disaggregated model and news classifies to candidate news, if is the news of user preference.

Finally it should be noted that: above embodiments are merely to illustrate the technical solution of the application rather than to its protection scopes Limitation, although the application is described in detail referring to above-described embodiment, those of ordinary skill in the art should Understand: those skilled in the art read the specific embodiment of application can still be carried out after the application various changes, modification or Person's equivalent replacement, but these changes, modification or equivalent replacement, are applying within pending claims.

Claims

1. a kind of Chinese news recommender system, it is characterised in that: the system comprises for collecting data, record user preference simultaneously Update learning layer, the recommendation layer for the data Layer of memory system data and for generating news recommendation list of recommender system；

It is described that layer is recommended to include the candidate generator according to user's request return recommendation news list and call preference module to institute State the collector for recommending news list to be ranked up；

After the request for receiving the collector, the candidate generator chooses a differentiation news quality from the news of system Candidate news list；

After receiving the candidate news list, the collector by calling collaborative filtering device based on memory, based on model Collaborative filtering device and signal filter based on content carry out the prediction of user preference score value to the candidate news list respectively, will Three score values obtained are weighted the user preference predicted value final as every article after summation, pre- by the user preference The candidate news list after measured value sequence is exported to be shown to front end.

2. a kind of Chinese news recommender system as described in claim 1, it is characterised in that: the learning layer include logger, Register, renovator and learner；

The logger in data Layer for being written news information；The Register is used to safeguard user's letter in data Layer Breath；

The user preference data is stored in the user preference of the data Layer for obtaining user preference data by the renovator And the article degree of correlation between news is calculated according to the user preference data of acquisition；

Learner for training preference pattern includes for the cluster device of recommended models of the training based on model and for training The classifier of content-based recommendation model.

3. a kind of Chinese news recommender system as claimed in claim 2, it is characterised in that: the cluster device reads the user Preference and the user information periodically carry out clustering learning to user according to the user preference, by cluster result and often A class updates respectively to user group and group preference the preference information of each article；

The classifier, periodic reading user information and recommends history, and study forms disaggregated model, and by the disaggregated model Update user property.

4. a kind of Chinese news recommender system as described in claim 1, it is characterised in that: the data Layer includes described in storage The distributed data base of the data of recommender system；

The field of the distributed data base include Item Information, user information, user preference information, article degree of correlation information, User group, group preference, user property and recommendation history.

5. a kind of Chinese news recommender system as claimed in claim 4, it is characterised in that: the Item Information, for recording Every news include number, classification, source, time, number of repetition and content information；

The user information includes the information numbered for record user；

The user group and described group of preference are respectively used to the user point obtained after storage clustering algorithm processing user preference data Group information and each group of preference to each piece article；

6. a kind of Chinese news recommender system as described in claim 1, it is characterised in that: the collaborative filtering based on memory Device is used to capture the short-term interest of user, is waited according to the relevance predication user between user preference data and the news to every The preference of selection chapter.

7. a kind of Chinese news recommender system as claimed in claim 4, it is characterised in that: the collaborative filtering based on model Device is used to capture the Long-term Interest of user, predicts the preference of candidate article user to time according to the user group and group preference The preference of selection chapter.

8. a kind of Chinese news recommender system as described in claim 1, it is characterised in that: the information filtering based on content Candidate news category is that user preference news and non-user are inclined by the disaggregated model and news content that device is acquired according to classifier Good news.