CN105989056A

CN105989056A - Chinese news recommending system

Info

Publication number: CN105989056A
Application number: CN201510063902.4A
Authority: CN
Inventors: 赵毅强; 许欢庆; 郭永福; 陈沛
Original assignee: Beijing Zhongsou Network Technology Co ltd
Current assignee: Beijing Zhongsou Cloud Business Network Technology Co ltd
Priority date: 2015-02-06
Filing date: 2015-02-06
Publication date: 2016-10-05
Anticipated expiration: 2035-02-06
Also published as: CN105989056B

Abstract

The invention provides a Chinese news recommending system which comprises a learning layer, a data layer and a recommending layer, wherein the learning layer is used for collecting data, recording user preference and updating a recommending module; the data layer is used for storing system data; the recommending layer is used for generating a news recommending list; the recommending layer comprises a candidate generator for feeding back recommended news lists according to the request of a user, and an accumulator for calling a preference module to sequence the recommended news lists. By adopting the Chinese news recommending system, Chinese news recommending systems of three recommending methods based on content, based on memory and based on models are feasibly combined, and on the basis that defects of the three models are avoided, respective advantages are brought into play, and the burden of the system is not increased.

Description

A kind of Chinese news commending system

Technical field

The present invention relates to the system of a kind of internet arena, a kind of Chinese news commending system.

Background technology

The Internet and mobile Internet universal, explosive growth and the people of quantity of information are obtaining information useful He interested On promptness and contradiction between accuracy, need content (news) commending system building a kind of personalization to have to meet people Obtain the information of daily demand pointedly.Personalized content recommendation system is currently mainly divided into two classes: content-based recommendation (content-based recommendation, or referred to as information filtering, information filter) method, Collaborative Recommendation (or claim For collaborative filtering, collaborative filter) method, and collaborative recommendation method includes based on memory (memory based) With two class methods based on model (model based).

Content-based recommendation: mainly for information filtering, with the reading/non-reading histories of user as language material, training text is classified Device, and use the grader obtained by training to predict user's fancy grade to new document (news), and make accordingly and be The no decision-making that the document is recommended user.

Recommendation based on memory: the preference information (certain user has read certain news) of record user, portrays by these preferences User, i.e. one user be represented as vector that its news read constituted (u=[and w1, w2 ..., wi ..., wn], i is literary composition The numbering of shelves, wi is user to the preference value of every document-as seen is 1, has not seen being 0), to a new document d (and Do not recommended to active user), use arest neighbors method to predict whether the document should be recommended active user and (have and currently use To the preference of document d, other users of the immediate vector of vector at family, predict that active user is to document d's with them Preference).

Recommendation based on model: unlike recommendation method based on memory, according to preference, all users are divided into some In individual group, user is that the preference of the document is calculated by the user's group by its place to the preference of new document 's.

Collaborative Recommendation is widely used in e-commerce field.Owing to renewal speed and the quantity of news are the biggest, use Collaborative Recommendation Method can cause a large amount of memory space consumption, and has every day a large amount of new data to introduce, by preference calculate the degree of association between document or Preference modeling to user is required for substantial amounts of calculating, and common system is difficult to provide the biggest calculating and storage resource, therefore , be widely used for the personalized recommendation of news category content is content-based recommendation method.Extensively should of big data platform With making the personalized push utilizing Collaborative Recommendation to complete news be possibly realized, but this type of system is relatively wide (except new because of the scope of application Outside news, such as video, audio frequency recommendation etc.) and generally have ignored the content of news completely.

Content-based recommendation to realize difficulty minimum, each user only uses a disaggregated model trained can portray it Preference, therefore shared memory space is the most less.But its shortcoming is it is also obvious that the use that places one's entire reliance upon the assurance of user preference The content of family history reading articles, and the interest preference in the preference of user, especially short-term, often have the strongest the most true Qualitative, therefore content-based recommendation is easy to allow user have machine-made sensation, the most easily misses and user's history not phase Like but the document that user is interested may be made.

Recommendation based on memory records every user preference information because of needs, and therefore the consumption to memory space is bigger, thus logical Often can only retain closest data (as used threshold value to control), this can cause the method only to have the short-term interest of user relatively Good carving effect, but well reflection can not be provided for the interest preference that user is long-term, thus affect recommendation effect.

Contrary with recommendation based on memory, recommendation based on model is to carve with the model of user's behavior in the past due to the preference of user Draw, so generally can preferably reflect the Long-term Interest preference of user, but owing to the reaction of data is had certain stagnant by model Rear property, bad to the assurance effect of user's short-term interest, also affect consequently recommended effect.

Meanwhile, the common drawback first-selection of collaborative recommendation method is:

For the application that customer volume is little, there is no any user and its new document being evaluated/reading cannot correctly be pushed away Recommend；

Next to that, for the text category information of news one class, do not use the content information that can be conveniently processed, for For the assurance of family interest preference, it it is a no small loss.

More efficient, result Chinese news commending system more accurately is searched for accordingly, it is desirable to provide a kind of.

Summary of the invention

For overcoming above-mentioned the deficiencies in the prior art, the present invention provides a kind of lightweight image searching method.

Realizing the solution that above-mentioned purpose used is:

A kind of Chinese news commending system, it thes improvement is that: described system includes for collecting data, record user preference And update the learning layer of recommending module, for the data Layer of memory system data and for generating the recommendation layer of news recommendation list；

Described recommendation layer includes asking to return the candidate generator of recommendation news list and call preference module to push away described according to user Recommend the collector that news list is ranked up.

Further, described learning layer includes recorder, Register, renovator and learner.

Further, described recorder is for writing news information in data Layer；Described Register is for safeguarding in data Layer User profile；

Described renovator is used for obtaining user preference data, described user preference data is stored in described data Layer user preference and User preference data according to obtaining calculates the article degree of association between news；

For train the learner of preference pattern include for train recommended models based on model cluster device and for training based on The grader of the recommended models of content.

Further, described cluster device reads described user preference and described user profile, periodic according to described user preference User is carried out clustering learning, the preference information of each article is updated by cluster result and each class respectively user's group and group preference ；

Described grader, periodically reads user profile and recommends history, and study forms disaggregated model, and by described disaggregated model Update to user property.

Further, described data Layer includes the distributed data base storing the data of described commending system；

The field of described distributed data base include Item Information, user profile, user preference information, article degree of association information, User's group, group preference, user property and recommendation history.

Further, described Item Information, for record every news include number, classify, originate, time, repetition secondary Number and the information of content；

Described user profile, for recording the information including numbering of user；

Described user preference, for recording each user preference information to every article；

Described article degree of association, for depositing the news calculated according to user preference relevance degree between any two；

Described user group and described group of preference, be respectively used to deposit the user grouping obtained after clustering algorithm processes user preference data Information and each group of preference to each piece article；

Described user property, for depositing the recommended models of each user based on content；

Described recommendation history, recommends the revision history of each user for record.

Further, after receiving the request of described collector, described candidate generator is chosen one from the news of system and is distinguished new Hear candidate's news list of quality；

After receiving described candidate's news list, described collector is by calling collaborative filtering device based on memory, association based on model Respectively described candidate's news list is carried out user preference score value prediction with filter and signal filter based on content, comprehensive after Obtain user preference score value predictive value, will by described user preference score value predictive value sort after described candidate's news list export to Front end is shown；

Further, described collaborative filtering device based on memory for capturing the short-term interest of user, according to user preference data and The relevance predication user between the described news preference to every candidate article.

Further, described collaborative filtering device based on model is for capturing the Long-term Interest of user, according to described user's group and group The preference of candidate article is predicted user's preference to candidate article by preference.

Further, disaggregated model that described signal filter based on content is acquired according to grader and news content, will wait Selecting news category is user preference news and non-user preference news.

Compared with prior art, the method have the advantages that

1, fusion that the system of the present invention is feasible recommends the Chinese of methods new based on content, based on memory with based on 3 classes such as models Hear commending system, on the basis of the defect avoiding above-mentioned three kinds of models, developed respective advantage, and do not increased system System burden.

2, the system of the present invention had both ensured accurately portraying user's short-term interest and Long-term Interest, also wanted to make full use of news Content improve the accuracy of recommendation and reduce the risk that new document can not accurately be recommended, and ensure in the feelings that customer volume is less Under condition, it is possible to effectively complete the personalized recommendation of news.

3, compared with only using news commending system based on content, the short-term that the system of the present invention can hold user effectively is emerging Interest, and make the content recommended the most rich and varied.

4, compared with the news commending system only using collaborative recommendation method, the system of the present invention can effectively solve because customer volume is less The problem causing recommending inaccurate and new document effectively not recommended, also well make use of the information of news itself simultaneously, Recommendation results is allowed to have more preferable interpretability.

5, the system of the present invention can hold the long and short phase interest of user simultaneously；And can solve well in the less situation of customer volume Under personalized recommendation problem and new document accurately recommend problem.

6, the system of the present invention has the highest extensibility, can be increased sharply by the method reply customer volume and data volume that increase machine Situation.

7, the system of the present invention has the strongest versatility, can facilitate to be applied to other guide (such as blog article, novel etc.) Personalized recommendation.

Accompanying drawing explanation

Fig. 1 is that in the present embodiment, news recommends basic flow sheet；

Fig. 2 is the flow chart of data processing figure of the present embodiment learning device；

Fig. 3 is the flow chart of data processing figure of collector in the present embodiment

Fig. 4 is based on memory, model, the commending system structural representation of commending contents module in this enforcement.

Detailed description of the invention

Below in conjunction with the accompanying drawings the detailed description of the invention of the present invention is described in further detail.

The present invention provides a kind of based on content, based on memory and Chinese news content recommendation system based on three kinds of model recommendation method. This system mainly includes collector and learner.

As it is shown in figure 1, Fig. 1 is that in the present embodiment, news recommends basic flow sheet；This Chinese news commending system mainly includes gathering Storage and learner.

Collector, for processing the request that user sends.

Collector filters out a news list considering user preference according to the interest model of user from candidate's news list, will It recommends user.

Learner, for generating the interest model of user.

Learner extracts the model that can correspondingly predict user to the fancy grade of new document from the browing record of user, i.e. User preferences modeling.

The record of the one or more click event that learner produces in news reading process according to user preference data, i.e. user , training generates new preference pattern, and property cycle of training of preference pattern is carried out.

Above-mentioned user preference data, specifically includes scoring or the data clicked on；The scoring of system of users preference includes situations below :

1., explicitly scoring: if there being marking system, can mark to user according to practical situation, such as the score value given according to each situation It is respectively 1～5；

2., implicit scores: if not having marking system, then user can be clicked on/accesses as 1 point, user does not clicks on/accesses As 0 point.The concrete score value of above-mentioned scoring can be according to each default.

As in figure 2 it is shown, the flow chart of data processing figure that Fig. 2 is the present embodiment learning device；Learner includes clustering device and grader.

Cluster device, reads user preference information and user profile, periodically initiates, by user preference, user is carried out Clustering Learning process, and by cluster result and each class, the preference information of each article updated respectively user's group and group preference.

For processing large-scale user preference information, clustering algorithm uses distributed programmed model (such as MapReduce) to realize , in hadoop ecosphere existing machine learning bag Mahout of based on MapReduce be can be used directly and completes.

Grader, periodically reads user profile and the reading histories of each user, and reading histories includes user preference, recommendation History and document information, start the process of Study strategies and methods, the grader succeeded in school updated user property.

Similar with cluster, also use distributed model (such as machine learning bag Mahout based on MapReduce) to realize.

Learning process includes:

(1), front end obtain user read event every time, the news i.e. clicked on, as new user preference data be sent to update Module.

(2) news (one or more clicks on logout, is preference data) that, new user is clicked on by more new module is filled out Enter the user preference in data base；

After being stored in user preference, calculate recommended models based on memory according to the data updated and be stored in the article degree of association of data base In, this model form is the matrix of dependency between document and document, and this model is for capturing the short-term interest of user.

In the present embodiment, providing a kind of simplest Model Calculating Method, the method includes:

Assume that user A has read news B, then all news C1 user A read, C2 ..., Cn, separately constitute with B To (C1, B), (C2, B) ..., (Cn, B), then in model (this model is the matrix that all news is constituted with news) These are added 1 to the weight of corresponding position.

According to system requirements, can there be other strategies, including reducing matrix size, temporally weights be carried out decay etc..

(3), learner periodically obtain user preference data, training generates new preference pattern, is respectively as follows:

The cluster device of learner, for training recommended models based on model；

The grader of learner, is used for training content-based recommendation model, and this recommended models is a disaggregated model.

Cluster device uses clustering algorithm, such as k-means, figure cluster, spectral clustering, fuzzy k-means, hierarchical clustering, theme Model tying etc., are divided into some heaps by a heap data according to the distance between data.

Grader uses sorting algorithm, such as arest neighbors, naive Bayesian, svm, decision tree etc., according to preprepared data Learn disaggregated model (this step should belong to learning layer), then use disaggregated model new data to be classified (this step Recommendation layer should be belonged to), i.e. grader is responsible for learning disaggregated model, it is recommended that the signal filter based on content in layer uses and divides New data is classified by class model.

(4), the preference pattern that will newly train of learner, i.e. based on model recommended models and content-based recommendation model, Being stored in the data base of correspondence, recommended models based on model is stored in user's group and group preference, and content-based recommendation model is stored in use Family attribute.

As it is shown on figure 3, the flow chart of data processing figure of collector in Fig. 3 the present embodiment, it is recommended that process specifically includes that

(1), front end receiver to user ask after, by it to collector.

(2), collector sends the requests to candidate generator.

(3), candidate generator is according to user profile (such as user id) and other information (such as news quality, weight of website etc.) The list of candidate's news is generated for this user.

(4), collector calls collaborative filtering device, collaborative filtering device based on model and letter based on content of based on memory respectively Breath filter, it was predicted that three preference value obtained are combined by user by the preference of news in candidate's news list, collector Ranked candidate news in descending order after conjunction, sends it to front end.

The present embodiment additionally provides the commending system merging collaborative recommendation method.Merge collaborative recommendation method and improve expanding of system Malleability, instance system uses Distributed Computing Platform as its basic storage and to calculate platform.

As shown in Figure 4, based on memory, model, the commending system structural representation of recommended models of content during Fig. 4 is this enforcement ；This commending system specifically includes: learning layer, data Layer and recommendation layer.

One, data (data) layer

Data Layer, for storing Various types of data required in commending system, is realized by distributed memory system.Data storage Field specifically includes that

1., Item Information (item information), for recording the relevant information of every news, including news numbering ID, Classification, Data Source, time, number of repetition, content etc..

2., user profile (user information), for recording the relevant information of user, including Customs Assigned Number UID etc..

3., user preference (user preference), the preference of every article (is marked or is clicked on for recording each user ) information, including click time, evaluation time etc., it is achieved above can use sparse matrix.

4., article degree of association (item similarity), as supporting the core data of collaborative filtering device based on memory, should Field is for depositing the news calculated according to user preference relevance degree between any two.

5., user's group (user group), the user grouping that obtains after depositing clustering algorithm process user preference data Information, including group # GID etc.；

6., organize preference (group preference), divide for depositing the user obtained after clustering algorithm processes user preference data Group information and each group of preference to each piece article；

Above-mentioned user's group and group preference are to support the core data of collaborative filtering device based on model.

7., user property (user profile), as support signal filter based on content core data, be used for depositing The recommended models (usually text classifier) of each user based on content.

8., recommend history (recommend history), recommend the history of the document of each user for record, specifically wrap Include news that user likes/clicks on, that do not like/do not click on.

Two, study (learn) layer

Learning layer, is used for collecting Various types of data, record user preference and updating recommending module, by using distributed memory system And MapReduce program realizes.Learning layer mainly includes with lower module:

1., recorder (recorder), the relevant information of every news is write in Item Information.

2., Register (register), be responsible for safeguarding (additions and deletions etc.) user profile, be deposited into user profile.

3., renovator (updater), be responsible for data Layer and obtain user preference information, and preference information is stored in user preference, To be responsible for calculating and update the article degree of association between (real-time or semireal time) news according to new user preference simultaneously.

4., cluster device (clusterer), periodically initiate by preference, user to be clustered (as being interval by half a day or sky) The learning process of (packet), and by cluster result and each class, the preference information of each article updated respectively user's group and group preference , need to read user preference information and user profile.

For processing large-scale user preference information, it is real that clustering algorithm should use distributed programmed model (such as MapReduce) Existing, in hadoop ecosphere existing machine learning bag Mahout of based on MapReduce be can be used directly here and complete.

5., grader (classifier), the reading histories periodically reading each user (need to read user preference, recommendation History, user profile and document information), start the process of Study strategies and methods, and update user property with the grader succeeded in school.

Similar with cluster, distributed model (MapReduce) also should be used to realize (utilizing Mahout bag).

Three, (recommend) layer is recommended

Recommend layer, for best suiting the top-quality of its preference according to for each read request output that request is each user News.Layer is recommended to be built by the client using distributed memory system.Layer is recommended mainly to include module:

1., candidate generator (candidates generator), be responsible for after receiving the request that collector sends, from system News in choose a candidate list.This candidate list does not have personalization features, and main purpose is to discriminate between the quality of news (temporally, source, length, number of repetition, weight of website etc.).

2., collector (ensembler), be responsible for response user request.After the request receiving user and reading news, from time Select maker to obtain candidate's news list, then use collaborative filtering device, collaborative filtering device based on model and base of based on memory Signal filter in content carries out user preference score value prediction to every article in recommendation list respectively, three points will obtained Value is weighted after summation as the final preference predictive value of every article, and the candidate's news list after sorting by predictive value will be defeated Go out to front end, front end show personalized recommendation result.

Collector is also responsible for recorded recommendation list recommendation history, in order to grader study content-based recommendation model.

The form of three recommending module database client programs the most in a distributed manner that collector is called is set up, and is respectively as follows:

Collaborative filtering device (memory-based cf) based on memory: according to the relevance predication between user preference data and news User's preference to every candidate article.

This filter is mainly used in capturing the short-term interest of user, and requirement of real-time is higher, therefore can carry out degree of association between news Caching, caching implementation can use the memory databases such as redis.

Collaborative filtering device (model-based cf) based on model: according to inclined to candidate article of the grouping information of user and packet Predicting user's preference to candidate article well, this recommending module is for capturing the Long-term Interest of user.

Signal filter based on content (content-based if): according to the interest classification mould of this user that grader is acquired Type and the content of news, classify to candidate's news, if for the news of user preference.

Finally should be noted that: above example is merely to illustrate the technical scheme of the application rather than the restriction to its protection domain, Although being described in detail the application with reference to above-described embodiment, those of ordinary skill in the field are it is understood that this area Technical staff still can carry out all changes, amendment or equivalent to the detailed description of the invention of application after reading the application, but These changes, amendment or equivalent, all within the claims that application is awaited the reply.

Claims

1. a Chinese news commending system, it is characterised in that: described system includes for collecting data, recording user preference and update the learning layer of recommending module, for the data Layer of memory system data and for generating the recommendation layer of news recommendation list；

Described recommendation layer includes asking to return according to user recommending the candidate generator of news list and calling the collector that described recommendation news list is ranked up by preference module.

2. a kind of Chinese news commending system as claimed in claim 1, it is characterised in that: described learning layer includes recorder, Register, renovator and learner.

3. a kind of Chinese news commending system as claimed in claim 1, it is characterised in that: described recorder is for writing news information in data Layer；Described Register is for safeguarding user profile in data Layer；

Described renovator is used for obtaining user preference data, the user preference that described user preference data is stored in described data Layer and the article degree of association calculating between news according to the user preference data obtained；

For training the learner of preference pattern to include for training the cluster device of recommended models based on model and for training the grader of content-based recommendation model.

4. a kind of Chinese news commending system as claimed in claim 3, it is characterized in that: described cluster device reads described user preference and described user profile, periodically user is carried out clustering learning according to described user preference, the preference information of each article is updated by cluster result and each class respectively user's group and group preference；

Described grader, periodically reads user profile and recommends history, and study forms disaggregated model, and described disaggregated model is updated user property.

5. a kind of Chinese news commending system as claimed in claim 1, it is characterised in that: described data Layer includes the distributed data base storing the data of described commending system；

The field of described distributed data base includes Item Information, user profile, user preference information, article degree of association information, user's group, group preference, user property and recommendation history.

6. Chinese news commending system as claimed in claim 5 a kind of, it is characterised in that: described Item Information, for record every news include number, classify, originate, the time, number of repetition and the information of content；

Described user group and described group of preference, be respectively used to deposit the user grouping information obtained after clustering algorithm processes user preference data and each group of preference to each piece article；

7. a kind of Chinese news commending system as claimed in claim 1, it is characterised in that: after receiving the request of described collector, described candidate generator chooses a candidate's news list distinguishing news quality from the news of system；

After receiving described candidate's news list, described collector carries out user preference score value prediction to described candidate's news list respectively by calling collaborative filtering device, collaborative filtering device based on model and signal filter based on content of based on memory, obtaining user preference score value predictive value after comprehensive, the described candidate's news list after sorting by described user preference score value predictive value exports shows to front end.

8. a kind of Chinese news commending system as claimed in claim 7, it is characterized in that: described collaborative filtering device based on memory is for capturing the short-term interest of user, according to the preference to every candidate article of the relevance predication user between user preference data and described news.

9. a kind of Chinese news commending system as claimed in claim 7, it is characterized in that: the preference of candidate article, for capturing the Long-term Interest of user, is predicted user's preference to candidate article according to described user's group and group preference by described collaborative filtering device based on model.

10. Chinese news commending system as claimed in claim 7 a kind of, it is characterised in that: disaggregated model that described signal filter based on content is acquired according to grader and news content, be user preference news and non-user preference news by candidate's news category.