CN103995878A - Distributed personalized recommendation method and system - Google Patents

Distributed personalized recommendation method and system Download PDF

Info

Publication number
CN103995878A
CN103995878A CN201410225857.3A CN201410225857A CN103995878A CN 103995878 A CN103995878 A CN 103995878A CN 201410225857 A CN201410225857 A CN 201410225857A CN 103995878 A CN103995878 A CN 103995878A
Authority
CN
China
Prior art keywords
scoring
user
project
item
scoring item
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410225857.3A
Other languages
Chinese (zh)
Other versions
CN103995878B (en
Inventor
王雷
况亚萍
夏磊
张成晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN201410225857.3A priority Critical patent/CN103995878B/en
Publication of CN103995878A publication Critical patent/CN103995878A/en
Application granted granted Critical
Publication of CN103995878B publication Critical patent/CN103995878B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems

Abstract

The invention discloses a distributed personalized recommendation method and system. The method includes the steps of setting up a score set containing user information, user graded items and corresponding score values; calculating the arithmetic mean value of all item pair score differences of all users and the total number of appearance times of the same item according to the score set and writing the arithmetic mean value and the total number in a preset item pair score difference table, wherein both the score set and the item pair score difference table are stored in a Hbase list; linking user information stored in an HDFS file system with a set of items, not graded, of the users and the item pair score difference table for the first time through a MapReduce mapping and simplifying model; linking the first-time linkage result with the score set for the second time, and calculating forecast score values of the items, not graded, of the user through a forecast algorithm; giving recommendation to the users according to the sizes of the forecast score values. By means of the method and the system, network transmission resources and input and output expenses are saved, and linkage efficiency is improved.

Description

A kind of distributed personalized recommendation method and system
Technical field
The present invention relates to distributed computing technology field, relate in particular to a kind of distributed personalized recommendation method and system.
Background technology
Personalized recommendation system is a kind of according to user's personal information, Characteristic of Interest, buying behavior etc., to the system of the interested information of user-customized recommended or commodity.Commending system has three important modules: user modeling module, recommended MBM, proposed algorithm module.Commending system, the characteristic information coupling in interest demand information in user model and recommended model, is used corresponding proposed algorithm to carry out calculating sifting simultaneously, finds the interested recommended of user's possibility, then recommends user.
Hadoop is the project of increasing income, and is a kind of Distributed Computing Platform of increasing income for large data processing and analysis, is a complete distributed software construction.Hadoop is comprised of cores such as HDFS (distributed file system), HBase (the distributed database of increasing income), MapReduce (mapping abbreviation model), ZooKeeper (reliability coherent system), is responsible for respectively distributed file system, distributed data base system, Distributed Parallel Computing Model and Parallel access control.
Wherein, Hadoop is the title of a distributed system architecture.User can develop distributed program in the situation that not understanding distributed bottom details, and makes full use of the ability of cluster high-speed computation and storage.There is the features such as high scalability, high reliability, high fault tolerance, low cost.
HDFS is the distributed file system (English full name, Hadoop Distributed File System are called for short HDFS) that is applicable to operating on common hardware that Hadoop realizes.HDFS has high fault tolerance, and the data access of high-throughput is provided, and being applicable to those has the application program of ultra-large data set.
HBase be one distributed, towards row the database of increasing income.It is different from general relational database, is a database that is suitable for unstructured data storage.There is high reliability, high-performance, towards row, the feature such as scalable.
MapReduce is the software architecture that Google proposes, and is a kind of programming model, for the concurrent operation of large-scale dataset (being greater than 1TB).It is to specify a Map (mapping) function that software is realized, and is used for one group of key-value pair to be mapped to one group of new key-value pair, specifies concurrent Reduce (abbreviation) function, is used for guaranteeing each the shared identical key group in the key-value pair of all mappings.
Zookeeper is the sub-project of Hadoop.Be one for the reliable coherent system of large-scale distributed system, the function providing comprises: configuring maintenance, name Service, Distributed Services, group service etc.The target of Zookeeper be exactly packaged complexity easily make mistakes key service, by being simple and easy to, the interface of use and performance are efficient, the system of function-stable offers user.
Current commending system mainly concentrates on single node, by explicitly collect user feedback or or analytical behavior record implicitly, study and interest preference and the behavior pattern of following the tracks of user, initiatively recommend those may interested commodity to user.Proposed algorithm is the core of commending system.Common proposed algorithm comprise collaborative filtering recommending, content-based recommendation, based on correlation rule recommend, based on effectiveness recommend, based on knowledge recommendation etc.
In numerous collaborative filterings, Slope One very easily realizes, predicts favor efficient, that accuracy high is subject to industry with it.Slope One algorithm is a series of general designations that are applied to the algorithm of system filtration, is a kind of thought of system filtered recommendation, is the most succinct form of the system filter algorithm based on project evaluation.Its essence is to come the similarity of matching complexity to calculate by simple linear relationship.From directly perceived, the common more project of occurrence number is on larger on the impact of scoring each other.Therefore there is more general WSO (Weighted Slope-one lightweight Slope one algorithm) method to poor being weighted of marking.
Along with the sharp increase of amount of user data, the centralized commending system of tradition based on single node shows the drawback of storage capacity deficiency and computing power deficiency, can not guarantee the real-time of recommendation, and recommendation effect is also not obvious.Amount of user data increases severely, and Slope One algorithm shortcomings shows: (1) former Slope One algorithm time, space complexity are too high, and unit cannot be stored intermediate file; (2) calculation cost is too high, conventionally can only use in small-scale data.On the other hand, on Slope One algorithm is realized, use MapReduce to close and carry out repeatedly connection operation at HDFS large-scale dataset, the connection of this multiple servers comprises Map side (Map-side) and Reduce side (Reduce-side).The connection applicability of Reduce side is high, but expense is large, and it is fast that Map side connects speed, and applicability is low.
Summary of the invention
The object of this invention is to provide a kind of distributed personalized recommendation method and system, saved network transmission resource and input and output expense, improved connection efficiency.
The object of the invention is to be achieved through the following technical solutions:
A distributed personalized recommendation method, the method comprises:
The scoring set that foundation comprises user profile, this user's scoring item and corresponding score value;
According to this set, calculate all users' all items to poor arithmetic mean and the same project total degree to appearance of marking, and the project of structure in advance that writes is shown to the difference of marking; Wherein, described scoring set and project all adopt Hbase to show to store to the poor table of marking;
Utilize MapReduce mapping abbreviation model by the user profile of storing in HDFS file system and not the set of scoring item with described project, poor the showing of marking connect for the first time, and will connect for the first time result and deposit in HDFS; By described, connect for the first time result and described scoring set connects for the second time again, and calculate the not prediction score value of scoring item of user in conjunction with prediction algorithm;
According to the size of described prediction score value, to user, recommend.
A distributed personalized recommendation system, this system comprises:
Module is set up in scoring set, for setting up the scoring set that comprises user profile, this user's scoring item and corresponding score value;
The poor information of project scoring is calculated writing module, for all items that calculates all users according to this set to mark poor arithmetic mean and same project to the total degree occurring, and write the project that builds in advance to poor the showing of marking; Wherein, described scoring set and project all adopt Hbase to show to store to the poor table of marking;
Scoring item is not predicted grading module, for utilize user profile that MapReduce mapping abbreviation model stores HDFS file system and not the set of scoring item with described project, poor the showing of marking connect for the first time, and will connect for the first time result and deposit in HDFS; By described, connect for the first time result and described scoring set connects for the second time again, and calculate the not prediction score value of scoring item of user in conjunction with prediction algorithm;
Recommending module, for recommending to user according to the size of described prediction score value.
As seen from the above technical solution provided by the invention, by integrating HBase, HDFS and MapReduce, realized the distributed implementation of lightweight algorithm DWSO, owing to a side of connection operation being made as to HBase table, the shuffling of Reduce in traditional connection process operation, phase sorting have been omitted, save network transmission resource and input and output expense, improved connection efficiency; Meanwhile, owing to adopting HBase storage user data, native system also provides the function of real-time, interactive and processed offline.
Accompanying drawing explanation
In order to be illustrated more clearly in the technical scheme of the embodiment of the present invention, below the accompanying drawing of required use during embodiment is described is briefly described, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, do not paying under the prerequisite of creative work, can also obtain other accompanying drawings according to these accompanying drawings.
The process flow diagram of a kind of distributed personalized recommendation method that Fig. 1 provides for the embodiment of the present invention one;
A kind of MapReduce of connection, the HDFS that Fig. 2 provides for the embodiment of the present invention one and the schematic diagram of HBase;
The schematic diagram of a kind of application scenarios that Fig. 3 provides for the embodiment of the present invention one;
The schematic diagram of a kind of distributed personalized recommendation system that Fig. 4 provides for the embodiment of the present invention two.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is only the present invention's part embodiment, rather than whole embodiment.Based on embodiments of the invention, those of ordinary skills, not making the every other embodiment obtaining under creative work prerequisite, belong to protection scope of the present invention.
Embodiment mono-
The process flow diagram of a kind of distributed personalized recommendation method that Fig. 1 provides for the embodiment of the present invention one.As shown in Figure 1, the method mainly comprises the steps:
The scoring set that step 11, foundation comprise user profile, this user's scoring item and corresponding score value.
Need before this to obtain user profile, this user's scoring item and corresponding score value, and this user's not scoring item.
User is by view Internet or use client to produce various information, and for example, user, by registration, can leave personal information, comprises sex, age, area, occupation, interest etc.; User produces information and the score information to these commodity of buying commodity by net purchase meeting; User listens to that concert produces the music listen to and to the scoring of music etc.For user's heterogeneous networks behavior, the information type that user can produce is also different.
After the above-mentioned information getting is analyzed, can set up a scoring set, so that follow-up, call at any time.This scoring set can be used auxiliary HBase table: t_taste shows to store.Described t_taste table can be as shown in table 1:
Table 1t_taste table
Step 12, according to this set, calculate all users all items to mark poor arithmetic mean and same project to the total degree occurring, and write the project that builds in advance to the poor table of marking.
Concrete: scoring item is as a project pair using any two of arbitrary user in described scoring set, and the right scoring of computational item is poor; Poor and the same project of the right scoring of all user's all items of polymerization is to the total degree occurring, then computational item is to the poor arithmetic mean of marking.
Exemplary, user A is respectively ra and rb to the scoring of project 1 and project 2, and scoring is poor is ra-rb.For all users, the number of times that project 1 and project 2 occur is weight.Project is to being exactly the situation of the so a pair of appearance of project 1 and project 2, referred to as project pair.
In the embodiment of the present invention, it is poor that Map function obtains by a two-layer cycling scoring that each project once occurs certain, and record this project the number of times of common appearance is outputed to Reduce (abbreviation).In Reduce, this project of polymerization is poor and occur total degree to all scorings, and calculates the poor arithmetic mean of scoring, and the project that finally writes is to marking in poor table.
In the embodiment of the present invention, project can be used auxiliary HBase table to the poor table of marking: t_diff shows to store.Described t_diff table can be as shown in table 2:
Table 2t_diff table
Step 13, utilize MapReduce mapping abbreviation model by the user profile of storing in HDFS file system and not the set of scoring item with described project, poor the showing of marking connect for the first time, and will connect for the first time result and deposit in HDFS; By described, connect for the first time result and described scoring set connects for the second time again, and calculate the not prediction score value of scoring item of user in conjunction with prediction algorithm.
This step need to be carried out twice connection, as shown in Figure 2, specifically comprises: 1) using each user and not scoring item as a subset, all users of polymerization and not scoring item; By all users and scoring item and described project do not connect for the first time and will connect for the first time result to the poor table of marking and deposit in HDFS; Wherein, during connection, take subset as unit connects, in the connection result of each subset, comprise this user profile, this user not scoring item, this user not scoring item with the project of scoring item to poor arithmetic mean and the total degree of appearance thereof of marking; 2) primary connection result and scoring set are connect for the second time; Wherein, the subset of take during connection connects as unit, by connect for the first time this user in result not the project of scoring item and scoring item mark poor arithmetic mean and this user this project in set of marking is done to subtraction to the score value of corresponding scoring item, obtain each subset that connects for the second time result obtain comprise this user profile, this user not scoring item, subtraction result and corresponding project to there is total degree.
Then, then calculate the not prediction score value of scoring item of user in conjunction with prediction algorithm, its formula is:
r u , i = Σ j ∈ I u w i , j · ( r u , j - diff i , j ) Σ j ∈ I u w i , j
Wherein, r u,iwith r u,jrepresent that respectively user u is to the prediction score value of scoring item i not and to the score value of scoring item j, I urepresent this user's scoring set, w i,jexpression project is the total degree to appearance to the project of project i and project j composition in the poor table of marking, diff i,jrepresent to connect for the first time this user in result not scoring item with the project of scoring item to the poor arithmetic mean of marking.
Step 14, according to the size of described prediction score value, to user, recommend.
In the embodiment of the present invention, can adopt conventional strategy to recommend to user according to the size of prediction score value.Exemplary, can preset a critical value, when prediction score value is greater than this critical value, by corresponding project recommendation to user; Otherwise, do not recommend user; Or, for different prediction score values, classify, for different classes, can be chosen in and recommend in the morning, noon or evening etc.
On the other hand, for the ease of understanding, below in conjunction with an actual scene, the present invention is described further.As shown in Figure 3, be a practical application scene that comprises 3 users and 3 projects, it should be noted that, provide the actual quantity of user and project herein, only for ease of illustrating; Its numerical value can be set according to demand in actual applications.
First, according to the information shown in Fig. 3, can build a scoring set, can be stored in t_taste table.Concrete is as shown in table 3:
Table 3 scoring set
Then, can computational item to mark poor arithmetic mean and same project to the total degree occurring, and write the project of building to marking in poor table, can be stored in t_diff table.Described project is as shown in table 4 to the poor table of marking:
Table 4 project is to the poor table of marking
The computation process of first row numerical value in brief description table 4:
1) the poor arithmetic mean of the scoring of project 1-project 2: the information from Fig. 3 or table 3 is known, user A and user C relate to project 1-project 2; For user A, the scoring difference of project 1-project 2 is: 4-5=-1; For user B, the scoring difference of project 1-project 2 is: 3-5=-2; The poor arithmetic mean of scoring of project 1-project 2 is: [1+ (2)]/2=-1.5; Meanwhile, project 1-project 2 total occurrence numbers (weight) are 2.
2) the poor arithmetic mean of the scoring of project 1-project 3: the information from Fig. 3 or table 3 is known, user A and user B relate to project 1-project 3; For user A, the scoring difference of project 1-project 3 is: 2-5=-3; For user B, the scoring difference of project 1-project 3 is: 3-4=-1; The poor arithmetic mean of scoring of project 1-project 3 is: [3+ (1)]/2=-2; Meanwhile, project 1-project 3 total occurrence numbers (weight) are 2.Subsequent calculations process and compute classes herein seemingly, therefore repeat no more.
Secondly, carry out twice connection.Concrete:
1, using each user and not scoring item as a subset, all users of polymerization and not scoring item; By all users and not scoring item (user-project to be predicted to) with described project, the poor table of marking is connect for the first time and will connect for the first time result and deposit in HDFS.
In this example, user B comprises not scoring item 2, and user C comprises not scoring item 3, then gathers table 4 and connect for the first time, obtains first and connects result, and its detailed process is as shown in table 5:
Table 5 connects for the first time
Concrete meaning in brief description table 5:
1) two subsets (B, 2) in scoring item set do not represent with (C, 3) that respectively user B comprises not scoring item 2, and user C comprises not scoring item 3.
2) project is in the poor table of marking (1, { <2,-1.5,2>, <3,-2,2>}) represent: project 1 respectively with the poor arithmetic mean of scoring and the weight of project 2, project 3, i.e. (project 1, { < project 2, poor arithmetic mean-1.5 of scoring, weight 2>, < project 3, poor arithmetic mean-2 of scoring, weight 2>}), afterwards; (2, { <1,1.5,2>, <3 ,-2,1>}) with (3, { <1,2,2>, <2,2,1>}) represented implication is similar, repeats no more.
3) connect for the first time (B, 2{<1,1.5 in result, 2>, <3 ,-2,1>}) represent: the not scoring item 2 of user B respectively with the poor arithmetic mean of scoring and the weight of project 1, project 3; I.e. (user B, scoring item 2{< project 1, does not mark and differ from arithmetic mean 1.5, weight 2>, < project 3, poor arithmetic mean-2 of scoring, weight 1>}); (C, 3{<1,2,2>, <2,2,1>}) represented implication is similar, repeats no more.
2, primary connection result and scoring set are connect for the second time, and calculate the not prediction score value of scoring item of user in conjunction with prediction algorithm, its detailed process is as shown in table 6:
Table 6 connects for the second time and predicts scoring
Concrete meaning in brief description table 6:
1) implication connecting for the first time in result was introduced in detail in table 5, repeated no more.
2) scoring set: can be referring to table 3, wherein A, B, C represent user, in angle brackets " <> ", first digit represents project, the score value of rear this project of numeric representation.
3) connect for the second time in result, in (B, 2{<2.5,2>, <5,1>}), B is user B, and 2 is scoring item 2 not; <2.5, in 2> 2.5 for user B connect for the first time this user in result not scoring item 2 and scoring item 1 project to mark poor arithmetic mean (1.5) and this user mark gather in this project the score value (4) of corresponding scoring item 1 is done to the result of subtraction, i.e. 4-1.5=2.5; In this scene, project 2 is 2 with the project that project 1 forms to occurrence number; <5, the implication that 1> represents is similar.In like manner, implication like (C, 3{<3,2>, <1,1>}) representation class, repeats no more.
4) in prediction scoring, in (B, 2,3.33), B be user B, and 2 mark for predicting for scoring item 2,3.33 not.Its computing formula is as follows:
r u , i = &Sigma; j &Element; I u w i , j &CenterDot; ( r u , j - diff i , j ) &Sigma; j &Element; I u w i , j
Wherein, r u,iwith r u,jrepresent that respectively user u is to the prediction score value of scoring item i not and to the score value of scoring item j, I urepresent this user's scoring set, w i,jexpression project is the total degree to appearance to the project of project i and project j composition in the poor table of marking, diff i,jrepresent to connect for the first time this user in result not scoring item with the project of scoring item to the poor arithmetic mean of marking.
Exemplary, calculate the not prediction scoring of scoring item 2 of user B:
r B , 2 = ( 4 - 1.5 ) &CenterDot; 2 + ( 3 - ( - 2 ) ) &CenterDot; 1 2 + 1 = 5 + 5 3 &ap; 3.33
(C, 3,2.33) represented implication and computing method are similar, repeat no more.
After completing in the manner described above the prediction of scoring item not, can result write in access customer-project grade form, concrete is as shown in table 7:
Table 6 user-project grade form
Finally, according to certain strategy by project recommendation to user.For example set a critical value, the commercial product recommending that prediction scoring is greater than this value is to user, and the commodity that are less than this value are not recommended user; Or, for different prediction score values, classify, for different classes, can be chosen in and recommend in the morning, noon or evening.
The embodiment of the present invention is by integrating HBase, HDFS and MapReduce, realized the distributed implementation of lightweight algorithm DWSO, owing to a side of connection operation being made as to HBase table, the shuffling of Reduce in traditional connection process operation, phase sorting have been omitted, save network transmission resource and input and output expense, improved connection efficiency; Meanwhile, owing to adopting HBase storage user data, native system also provides the function of real-time, interactive and processed offline.
Embodiment bis-
The schematic diagram of a kind of distributed personalized recommendation system that Fig. 4 provides for the embodiment of the present invention.As shown in Figure 4, this system mainly comprises:
Module 41 is set up in scoring set, for setting up the scoring set that comprises user profile, this user's scoring item and corresponding score value;
The poor information of project scoring is calculated writing module 42, for all items that calculates all users according to this set to mark poor arithmetic mean and same project to the total degree occurring, and write the project that builds in advance to poor the showing of marking; Wherein, described scoring set and project all adopt Hbase to show to store to the poor table of marking;
Scoring item is not predicted grading module 43, for utilize user profile that MapReduce mapping abbreviation model stores HDFS file system and not the set of scoring item with described project, poor the showing of marking connect for the first time, and will connect for the first time result and deposit in HDFS; By described, connect for the first time result and described scoring set connects for the second time again, and calculate the not prediction score value of scoring item of user in conjunction with prediction algorithm;
Recommending module 44, for recommending to user according to the size of described prediction score value.
Further, this system can also comprise:
Acquisition of information module 45, for obtaining user profile, this user's scoring item and corresponding score value and this user's not scoring item before the scoring set that comprises user profile, this user's scoring item and corresponding score value in foundation.
Further, the poor information calculating of described project scoring writing module 42 can also comprise:
The poor information computing module 421 of project scoring, for scoring item is as a project pair using any two of the arbitrary user of described scoring set, and the right scoring of computational item is poor; Poor and the same project of the right scoring of all user's all items of polymerization is to the total degree occurring, then computational item is to the poor arithmetic mean of marking.
Further, described not scoring item prediction grading module 43 can also comprise:
Connect for the first time module 431, for using each user and not scoring item as a subset, all users of polymerization and not scoring item; By all users and scoring item and described project do not connect for the first time and will connect for the first time result to the poor table of marking and deposit in HDFS; Concrete: during connection, take subset as unit connects, in the connection result of each subset, comprise this user profile, this user not scoring item, this user not scoring item with the project of scoring item to poor arithmetic mean and the total degree of appearance thereof of marking;
Successive module 432 for the second time, for primary connection result and scoring set are connect for the second time; Concrete: the subset of take during connection connects as unit, by connect for the first time this user in result not the project of scoring item and scoring item mark poor arithmetic mean and this user this project in set of marking is done to subtraction to the score value of corresponding scoring item, obtain each subset that connects for the second time result obtain comprise this user profile, this user not scoring item, subtraction result and corresponding project to there is total degree.
Further, described not scoring item prediction grading module 43 can also comprise:
Prediction module 433, in conjunction with prediction algorithm, calculate user not the prediction score value of scoring item comprise:
r u , i = &Sigma; j &Element; I u w i , j &CenterDot; ( r u , j - diff i , j ) &Sigma; j &Element; I u w i , j ;
Wherein, r u,iwith r u,jrepresent that respectively user u is to the prediction score value of scoring item i not and to the score value of scoring item j, I urepresent this user's scoring set, w i,jexpression project is the total degree to appearance to the project of project i and project j composition in the poor table of marking, diff i,jrepresent to connect for the first time this user in result not scoring item with the project of scoring item to the poor arithmetic mean of marking.
It should be noted that, in the specific implementation of the function that each functional module comprising in said system realizes each embodiment above, have a detailed description, therefore here repeat no more.
Those skilled in the art can be well understood to, for convenience and simplicity of description, only the division with above-mentioned each functional module is illustrated, in practical application, can above-mentioned functions be distributed and by different functional modules, completed as required, the inner structure of the system of being about to is divided into different functional modules, to complete all or part of function described above.
Through the above description of the embodiments, those skilled in the art can be well understood to above-described embodiment and can realize by software, and the mode that also can add necessary general hardware platform by software realizes.Understanding based on such, the technical scheme of above-described embodiment can embody with the form of software product, it (can be CD-ROM that this software product can be stored in a non-volatile memory medium, USB flash disk, portable hard drive etc.) in, comprise some instructions with so that computer equipment (can be personal computer, server, or the network equipment etc.) carry out the method described in each embodiment of the present invention.
The above; be only the present invention's embodiment preferably, but protection scope of the present invention is not limited to this, is anyly familiar with in technical scope that those skilled in the art disclose in the present invention; the variation that can expect easily or replacement, within all should being encompassed in protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain of claims.

Claims (10)

1. a distributed personalized recommendation method, is characterized in that, the method comprises:
The scoring set that foundation comprises user profile, this user's scoring item and corresponding score value;
According to this set, calculate all users' all items to poor arithmetic mean and the same project total degree to appearance of marking, and the project of structure in advance that writes is shown to the difference of marking; Wherein, described scoring set and project all adopt Hbase to show to store to the poor table of marking;
Utilize MapReduce mapping abbreviation model by the user profile of storing in HDFS file system and not the set of scoring item with described project, poor the showing of marking connect for the first time, and will connect for the first time result and deposit in HDFS; By described, connect for the first time result and described scoring set connects for the second time again, and calculate the not prediction score value of scoring item of user in conjunction with prediction algorithm;
According to the size of described prediction score value, to user, recommend.
2. method according to claim 1, is characterized in that, before the scoring set that described foundation comprises user profile, this user's scoring item and corresponding score value, comprises:
Obtain user profile, this user's scoring item and corresponding score value, and this user's not scoring item.
3. method according to claim 1, is characterized in that, all users' of described calculating all items comprises the total degree occurring mark poor arithmetic mean and same project:
Scoring item is as a project pair using any two of arbitrary user in described scoring set, and the right scoring of computational item is poor;
Poor and the same project of the right scoring of all user's all items of polymerization is to the total degree occurring, then computational item is to the poor arithmetic mean of marking.
4. method according to claim 1 and 2, it is characterized in that, the described MapReduce of utilization mapping abbreviation model by the user profile of storing in HDFS file system and not the set of scoring item with described project, poor the showing of marking connect for the first time, and will connect for the first time result and deposit in HDFS; By described, connect for the first time result and described scoring set connects and comprises for the second time again:
Using each user and not scoring item as a subset, all users of polymerization and not scoring item;
By all users and scoring item and described project do not connect for the first time and will connect for the first time result to the poor table of marking and deposit in HDFS; Concrete: during connection, take subset as unit connects, in the connection result of each subset, comprise this user profile, this user not scoring item, this user not scoring item with the project of scoring item to poor arithmetic mean and the total degree of appearance thereof of marking;
Primary connection result and scoring set are connect for the second time; Concrete: the subset of take during connection connects as unit, by connect for the first time this user in result not the project of scoring item and scoring item mark poor arithmetic mean and this user this project in set of marking is done to subtraction to the score value of corresponding scoring item, obtain each subset that connects for the second time result obtain comprise this user profile, this user not scoring item, subtraction result and corresponding project to there is total degree.
5. method according to claim 4, is characterized in that, described combination prediction algorithm calculate user not the prediction score value of scoring item comprise:
r u , i = &Sigma; j &Element; I u w i , j &CenterDot; ( r u , j - diff i , j ) &Sigma; j &Element; I u w i , j ;
Wherein, r u,iwith r u,jrepresent that respectively user u is to the prediction score value of scoring item i not and to the score value of scoring item j, I urepresent this user's scoring set, w i,jexpression project is the total degree to appearance to the project of project i and project j composition in the poor table of marking, diff i,jrepresent to connect for the first time this user in result not scoring item with the project of scoring item to the poor arithmetic mean of marking.
6. a distributed personalized recommendation system, is characterized in that, this system comprises:
Module is set up in scoring set, for setting up the scoring set that comprises user profile, this user's scoring item and corresponding score value;
The poor information of project scoring is calculated writing module, for all items that calculates all users according to this set to mark poor arithmetic mean and same project to the total degree occurring, and write the project that builds in advance to poor the showing of marking; Wherein, described scoring set and project all adopt Hbase to show to store to the poor table of marking;
Scoring item is not predicted grading module, for utilize user profile that MapReduce mapping abbreviation model stores HDFS file system and not the set of scoring item with described project, poor the showing of marking connect for the first time, and will connect for the first time result and deposit in HDFS; By described, connect for the first time result and described scoring set connects for the second time again, and calculate the not prediction score value of scoring item of user in conjunction with prediction algorithm;
Recommending module, for recommending to user according to the size of described prediction score value.
7. system according to claim 6, is characterized in that, this system also comprises:
Acquisition of information module, for obtaining user profile, this user's scoring item and corresponding score value and this user's not scoring item before the scoring set that comprises user profile, this user's scoring item and corresponding score value in foundation.
8. system according to claim 6, is characterized in that, the poor information of described project scoring is calculated writing module and comprised:
The poor information computing module of project scoring, for scoring item is as a project pair using any two of the arbitrary user of described scoring set, and the right scoring of computational item is poor; Poor and the same project of the right scoring of all user's all items of polymerization is to the total degree occurring, then computational item is to the poor arithmetic mean of marking.
9. according to the system described in claim 6 or 7, it is characterized in that, scoring item prediction grading module does not comprise:
Connect for the first time module, for using each user and not scoring item as a subset, all users of polymerization and not scoring item; By all users and scoring item and described project do not connect for the first time and will connect for the first time result to the poor table of marking and deposit in HDFS; Concrete: during connection, take subset as unit connects, in the connection result of each subset, comprise this user profile, this user not scoring item, this user not scoring item with the project of scoring item to poor arithmetic mean and the total degree of appearance thereof of marking;
Connect for the second time module, for primary connection result and scoring set are connect for the second time; Concrete: the subset of take during connection connects as unit, by connect for the first time this user in result not the project of scoring item and scoring item mark poor arithmetic mean and this user this project in set of marking is done to subtraction to the score value of corresponding scoring item, obtain each subset that connects for the second time result obtain comprise this user profile, this user not scoring item, subtraction result and corresponding project to there is total degree.
10. method according to claim 9, is characterized in that, described not scoring item prediction grading module also comprises:
Prediction module, in conjunction with prediction algorithm, calculate user not the prediction score value of scoring item comprise:
r u , i = &Sigma; j &Element; I u w i , j &CenterDot; ( r u , j - diff i , j ) &Sigma; j &Element; I u w i , j ;
Wherein, r u,iwith r u,jrepresent that respectively user u is to the prediction score value of scoring item i not and to the score value of scoring item j, I urepresent this user's scoring set, w i,jexpression project is the total degree to appearance to the project of project i and project j composition in the poor table of marking, diff i,jrepresent to connect for the first time this user in result not scoring item with the project of scoring item to the poor arithmetic mean of marking.
CN201410225857.3A 2014-05-23 2014-05-23 A kind of distributed personalized recommendation method and system Active CN103995878B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410225857.3A CN103995878B (en) 2014-05-23 2014-05-23 A kind of distributed personalized recommendation method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410225857.3A CN103995878B (en) 2014-05-23 2014-05-23 A kind of distributed personalized recommendation method and system

Publications (2)

Publication Number Publication Date
CN103995878A true CN103995878A (en) 2014-08-20
CN103995878B CN103995878B (en) 2017-10-27

Family

ID=51310043

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410225857.3A Active CN103995878B (en) 2014-05-23 2014-05-23 A kind of distributed personalized recommendation method and system

Country Status (1)

Country Link
CN (1) CN103995878B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104572880A (en) * 2014-12-22 2015-04-29 中国科学院信息工程研究所 Method and system for realizing concurrent cooperated filtering based on users
WO2019128394A1 (en) * 2017-12-29 2019-07-04 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Method for processing fusion data and information recommendation system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102541920A (en) * 2010-12-24 2012-07-04 华东师范大学 Method and device for improving accuracy degree by collaborative filtering jointly based on user and item
US20130282668A1 (en) * 2012-04-20 2013-10-24 Cloudera, Inc. Automatic repair of corrupt hbases

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102541920A (en) * 2010-12-24 2012-07-04 华东师范大学 Method and device for improving accuracy degree by collaborative filtering jointly based on user and item
US20130282668A1 (en) * 2012-04-20 2013-10-24 Cloudera, Inc. Automatic repair of corrupt hbases

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘源: "《基于云计算的分布式推荐引擎算法研究》", 《中国优秀硕士学位论文全文数据库(信息科技辑)》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104572880A (en) * 2014-12-22 2015-04-29 中国科学院信息工程研究所 Method and system for realizing concurrent cooperated filtering based on users
CN104572880B (en) * 2014-12-22 2018-03-02 中国科学院信息工程研究所 The Parallel Implementation method and system of collaborative filtering based on user
WO2019128394A1 (en) * 2017-12-29 2019-07-04 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Method for processing fusion data and information recommendation system
US11061966B2 (en) 2017-12-29 2021-07-13 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Method for processing fusion data and information recommendation system

Also Published As

Publication number Publication date
CN103995878B (en) 2017-10-27

Similar Documents

Publication Publication Date Title
US10120930B2 (en) Identifying entity mappings across data assets
Jifa et al. Data, DIKW, big data and data science
Zhou et al. Bipartite network projection and personal recommendation
US10423631B2 (en) Automated data exploration and validation
US11551123B2 (en) Automatic visualization and explanation of feature learning output from a relational database for predictive modelling
CN104216662B (en) Optimal Volume Placement Across Remote Replication Relationships
CN105975440A (en) Matrix decomposition parallelization method based on graph calculation model
Mone Beyond hadoop
CN103678436A (en) Information processing system and information processing method
CN114579584B (en) Data table processing method and device, computer equipment and storage medium
CN105095414A (en) Method and apparatus used for predicting network search volume
US20180357564A1 (en) Cognitive flow prediction
US10474670B1 (en) Category predictions with browse node probabilities
Ming-Te et al. Using data mining technique to perform the performance assessment of lean service
US20130093771A1 (en) Modified flow graph depiction
EP3472767A1 (en) Accurate and detailed modeling of systems using a distributed simulation engine
US20220335270A1 (en) Knowledge graph compression
CN103995878A (en) Distributed personalized recommendation method and system
Pang et al. An efficient approach for multi-user multi-cloud service composition in human–land sustainable computational systems
CN107506399A (en) Method, system, equipment and the storage medium of data cell quick segmentation
Arora et al. Empowerment through Big Data-Issues and Challenges
Deepthi et al. Big data mining using very-large-scale data processing platforms
Hameed et al. Business intelligence: Self adapting and prioritizing database algorithm for providing big data insight in domain knowledge and processing of volume based instructions based on scheduled and contextual shifting of data
CN109614587B (en) Intelligent human relationship analysis modeling method, terminal device and storage medium
Godahewa et al. SETAR-Tree: a novel and accurate tree algorithm for global time series forecasting

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant