CN103744917B

CN103744917B - Method and system are recommended in mixing

Info

Publication number: CN103744917B
Application number: CN201310737345.0A
Authority: CN
Inventors: 姜立宇; 张霞; 韩宇; 王菊
Original assignee: Neusoft Corp
Current assignee: Neusoft Corp
Priority date: 2013-12-27
Filing date: 2013-12-27
Publication date: 2018-04-06
Anticipated expiration: 2033-12-27
Also published as: CN103744917A

Abstract

The present invention provides a kind of mixing and recommends method and system, and wherein method includes the user data in database being divided into single dimension user characteristics value and various dimensions user characteristics value；Single dimension user characteristics value is calculated using Bayes's average algorithm, obtains the proposed algorithm interest-degree of user；Various dimensions user characteristics value is calculated using model-naive Bayesian, obtains the proposed algorithm interest-degree of user；According to the proposed algorithm interest-degree of single dimension user and the proposed algorithm interest-degree of various dimensions user, recommendation list is determined, and recommendation list is recommended to the user of database.Using the present invention, it can solve the problem that the proposed algorithm interest-degree predictablity rate of user is low, calculate the proposed algorithm interest-degree for assessing different user and the problem of the time bottleneck of training user's interest model.

Description

Hybrid recommendation method and system

Technical Field

The invention relates to the technical field of recommendation, in particular to a hybrid recommendation method and system.

Background

Recommendation technology has been developed for more than ten years, and numerous recommendation algorithms have been proposed and applied in the industry in the meantime. After a lot of practice, it appears that none of the methods can be applied to all scenarios, because each recommendation algorithm has its own limitations that are difficult to overcome.

In the face of the limitations of various recommendation algorithms, hybrid recommendation technologies are introduced in the industry, and at present, there are many hybrid recommendation technologies, such as: weighted hybrid recommendation techniques, cross-blending hybrid recommendation techniques, and the like. Fig. 1 illustrates a flow of a conventional hybrid recommendation technique, which, as illustrated in fig. 1, includes:

s110: starting;

s120: a user browses a target business website;

s130: the website sends a request to a recommendation system;

s140: a recommendation system external interface;

s150: recommending an algorithm; a and B are called in a recommendation algorithm, wherein A is a commodity information database, and B is a user behavior and user information database;

s160: whether a hybrid recommendation strategy is used or not is judged, if yes, S180 is executed; if not, executing S170;

s170: a hybrid recommendation policy component;

s180: a recommendation list;

s190: and (6) ending.

The above is the whole flow of the conventional hybrid recommendation technology, and at present, the mainstream hybrid recommendation strategy includes: weighted hybrid recommendation techniques, hierarchical hybrid recommendation techniques, cross hybrid recommendation techniques, recommendation model hybrid techniques, and the like.

However, these hybrid recommendations have some common problems.

Since the recommendation algorithm mixing strategy of the recommendation system is the same for all users, the strategy may not be applicable to all users. Such as: a certain system provides various recommendation algorithm systems, a hierarchical mixing strategy is adopted, the result generated by a recommendation algorithm with higher accuracy is preferentially displayed, and the result of other algorithms with higher recall rate is added subsequently. For the recommendation list of the final recommendation result (Top-N), the user with single interest can be satisfied with the recommendation result; the user with wide interest always preferentially sees an interested aspect, and over time, the algorithm interestingness prediction accuracy of the user is low, the novelty is insufficient, and the time required by the recommendation system to complete the recommendation is increased, so that the satisfaction degree of the user on the recommendation result is reduced.

Therefore, there is a need for a new hybrid recommendation technique that addresses the problems presented above.

Disclosure of Invention

In view of the above problems, an object of the present invention is to provide a hybrid recommendation method and system, so as to solve the problems of low prediction accuracy of the recommendation algorithm interestingness of the user, calculation and evaluation of the recommendation algorithm interestingness of different users, and time bottleneck in training the user interest model.

In one aspect, the invention provides a hybrid recommendation method, which includes dividing user data in a database into a single-dimensional user characteristic value and a multi-dimensional user characteristic value;

calculating the characteristic value of the single-dimensional user by adopting a Bayesian average algorithm to obtain the recommendation algorithm interestingness of the single-dimensional user; and the number of the first and second groups,

calculating the characteristic values of the multi-dimensional users by adopting a naive Bayes model, establishing a multi-dimensional user interest model, and training the multi-dimensional user interest model to obtain the recommendation algorithm interest of the multi-dimensional users;

and determining a recommendation list according to the recommendation algorithm interestingness of the single-dimensional user and the recommendation algorithm interestingness of the multi-dimensional user, and recommending the recommendation list to the user of the database.

In addition, the preferred scheme is that in the process of determining the recommendation list according to the recommendation algorithm interestingness of the single-dimensional user and the recommendation algorithm interestingness of the multi-dimensional user, the recommendation algorithm interestingness of the single-dimensional user and the recommendation algorithm interestingness of the multi-dimensional user are fused through a mixed recommendation strategy component to obtain the interestingness of the user of the database to each recommendation algorithm, the proportion of each recommendation algorithm in all recommendation algorithms is obtained according to the interestingness of each recommendation algorithm, and the recommendation list is determined.

In addition, the preferred scheme is that in the process of calculating the characteristic value of the single-dimensional user by adopting a Bayesian average algorithm, the recommendation algorithm interestingness of the single-dimensional user is calculated by adopting a parallel calculation method;

in the process of calculating the characteristic values of the multi-dimensional users by adopting a naive Bayes model, the recommendation algorithm interestingness of the multi-dimensional users is calculated by adopting a parallel calculation method.

In addition, preferably, in the process of calculating the feature value of the single-dimensional user by adopting the Bayesian average algorithm,

the Bayesian average algorithm formula is as follows:

wherein, C is a self-set constant;

m is the average of the items in the database;

n is the data quantity of the project;

x is the value of each datum;

the above mentionedIn the formula, m is prior probability, each value of x is an adjustment factor, and the value of an item is increased along with the increase of nWill continually trend towards its correct value.

In addition, the preferable scheme is that in the process of calculating the characteristic value of the multi-dimensional user by adopting the naive Bayes model,

the naive Bayes model formula is as follows:

wherein, C is a target event; x is a characteristic value;

p (C | X) is the probability of C occurrence under X condition;

p (X | C) is the probability of X occurring under C;

p (C) probability of being C;

p (X) is the probability of X.

In another aspect, the present invention further provides a hybrid recommendation system, including:

the database distinguishing unit is used for dividing the user data in the database into a single-dimensional user characteristic value and a multi-dimensional user characteristic value;

the Bayesian average algorithm calculating unit is used for calculating the characteristic value of the single-dimensional user by adopting a Bayesian average algorithm to obtain the recommendation algorithm interest of the single-dimensional user;

the naive Bayes model calculating unit is used for calculating the characteristic value of the multi-dimensional user by adopting a naive Bayes model, establishing a multi-dimensional user interest model, training the multi-dimensional user interest model and obtaining the recommendation algorithm interest degree of the multi-dimensional user;

and the algorithm recommending unit is used for determining a recommending list according to the recommending algorithm interestingness of the single-dimensional user and the recommending algorithm interestingness of the multi-dimensional user and recommending the recommending list to the user of the database.

According to the technical scheme, the mixed recommendation method and the mixed recommendation system provided by the invention can calculate and evaluate recommendation algorithm interestingness of different users, and can improve the accuracy of algorithm interest prediction of the single-dimensional user by obtaining the recommendation algorithm interestingness of the single-dimensional user through a Bayesian average algorithm; the recommendation algorithm interest of the multi-dimensional user is obtained by adopting a naive Bayes model, the implicit algorithm interest of a user group can be mined by adopting naive Bayes learning, and the algorithm interest prediction accuracy of the user is improved; the time for training the user interest model and the time for estimating the interest prediction can be shortened through parallel calculation.

To the accomplishment of the foregoing and related ends, one or more aspects of the invention comprise the features hereinafter fully described and particularly pointed out in the claims. The following description and the annexed drawings set forth in detail certain illustrative aspects of the invention. These aspects are indicative, however, of but a few of the various ways in which the principles of the invention may be employed. Further, the present invention is intended to include all such aspects and their equivalents.

Drawings

Other objects and results of the present invention will become more apparent and more readily appreciated as the same becomes better understood by reference to the following description and appended claims, taken in conjunction with the accompanying drawings. In the drawings:

FIG. 1 is a flow diagram of a conventional hybrid recommendation technique;

FIG. 2 is a flow chart illustrating a hybrid recommendation method according to an embodiment of the present invention;

fig. 3 is a block diagram of a logical structure of a hybrid recommendation system according to an embodiment of the present invention.

The same reference numbers in all figures indicate similar or corresponding features or functions.

Detailed Description

In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of one or more embodiments. It may be evident, however, that such embodiment(s) may be practiced without these specific details. Specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

In the conventional hybrid recommendation technology, after receiving a request, the external interface of the recommendation system calls a recommendation algorithm and sends a calculation result (Top-N) to the hybrid recommendation policy component. And if the recommendation algorithm is single, directly displaying the result on a recommendation list. If multiple recommendation algorithms are adopted, the mixed recommendation strategy component fuses the generated results of the algorithms according to the fixed strategy, and the recommendation list shows the final fused recommendation results. Aiming at the problems of inaccurate algorithm recommendation rate, flexible recommendation system and the like of the traditional hybrid recommendation technology, the invention provides a personalized hybrid recommendation technology which can calculate the interest degree of a user in an algorithm according to user data, thereby automatically adjusting the proportion of each algorithm generation result in a recommendation list and providing personalized hybrid recommendation for the user.

In the invention, when a user browses a target business website, the website requests a recommendation result to a recommendation system, and a recommendation algorithm is called after an external interface of the recommendation system receives the request. The personalized mixed recommendation strategy component determines a final recommendation list of a user according to the interest tendency of the user, and a training data source is from a user behavior and user information database.

Specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings. FIG. 2 illustrates a flow of a hybrid recommendation method according to an embodiment of the invention; as shown in fig. 2, the present invention provides a hybrid recommendation method, and the specific recommendation method includes:

s210: and dividing the user data in the database into a single-dimension user characteristic value and a multi-dimension user characteristic value.

Specifically, in the invention, a characteristic value for calculating the interestingness of the recommendation algorithm is selected according to the service environment of the recommendation system, and the characteristic value can be single-dimensional user data or multi-dimensional user data. The database includes user behavior data and user information data.

In the present invention, the data of users are divided into two major categories: one type is a single-dimensional user characteristic value, such as clicking, grading, collecting and the like of recommended contents by a user, and the data has the characteristic of intuitively determining the algorithm interest preference of the user; and the other type is multi-dimensional user characteristic values, such as the age, the sex, the address and the like of the user, which can not determine the algorithm preference of the user through intuitive judgment.

That is to say, the single-dimensional user characteristic value can intuitively judge the user information of the user's interest degree in the recommendation algorithm, which is usually the behavior data of the user or the information fed back by the user; the multi-dimensional user characteristic value is multi-dimensional and invisible, and user information which can not visually judge the user interest degree of the recommendation algorithm generally can not be obtained.

S220: and calculating the characteristic value of the single-dimensional user by adopting a Bayesian average algorithm to obtain the recommendation algorithm interestingness of the single-dimensional user.

Specifically, the invention adopts the thought of Bayesian average algorithm to optimize and correct the characteristic value of the single-dimensional user. The core idea is to draw the interest of the cold-start single-dimensional user to the public interest, and continuously correct the interest point of the cold-start single-dimensional user through the perfection of the single-dimensional user data, so that the calculation result is closer to the real interest of the user.

It should be noted that, the user cold start means that when a new user arrives, there is no behavior data or information data of the user, so that the interest of the user cannot be predicted according to the historical behavior or user information of the user, and thus personalized recommendation cannot be made to the user.

Assume that three recommendation algorithms A, B, C are provided in the recommendation process. Presented to the end user is a list of recommendations generated by a mix of the three algorithms. How is the user most interested in the results recommended by which algorithm? There are several methods available, among which a simple one: namely, calculating the click rate of the user, the specific method is as follows:

under the condition that the number of exposed recommended results of each algorithm is the same, the algorithm which is clicked by the target user for the most times is favored by the user. Let the number of clicks of the user on the algorithm A be C_AThe number of clicks C on B and C is obtained in the same way_B、C_C(ii) a Thus, passing through C_A，C_BAnd C_CThe number of the generated results can be dynamically adjusted; the percentage P of the recommended number of the algorithm A to the total number of the recommended list_AComprises the following steps:

①

however, data such as click rate is greatly influenced by environment and cannot completely and objectively reflect user preferences. A better solution than click-through rate is to judge user preferences by the user's rating of recommended items. Suppose that user U scores the recommended results as follows (with the scoring interval between 1-5):

thenCan calculate W by the same principle_BAnd W_C. The advantage of using the score is that the score is determined by the user himself, avoiding the influence of the environment on the user operation.

②

The average value W does not objectively reflect the preference of the user for the algorithm. The reason is that, similar to n bernoulli experiments, the resulting probability tends towards its desired P as the number of experiments increases. That is, as the user's score for the items generated by algorithm a increases, W approaches its true preference, and too few scores may result in unreliable values for W.

The invention solves the problem of insufficient reliability of the average value W in a cold starting state by using the thought of the Bayesian average algorithm. Bayesian averaging is more applicable to the ranking problem. It mirrors the idea of Bayesian Inference: in the case of uncertain results, a value is estimated and then it is brought closer to the correct value by continuous correction of new information.

Bayesian averaging algorithm general form:

③

wherein, C is a self-set constant;

m is the average of the items in the database;

n is the data quantity of the project;

x is the value of each datum.

In the above formula ③, m is a priori probability, each value of x is an adjustment factor, and as n increases, the value of the termWill continually trend towards its correct value.

By taking the idea of formula ③ as a reference, W in formula ②_AAnd (4) carrying out correction optimization, wherein the optimization result is as follows:

④

wherein,the average scoring times of the users of the algorithm A are calculated;

average user scores for algorithm a;

T_Aaverage scoring times of the target user for the algorithm A;

W_Athe average score for the target user for algorithm a.

The prior probability interest tendency W of the target user can be calculated by the formula ④_A′、W_B' and W_C' and continuously amend with the increase of user behavior data, so that the user behavior data continuously approaches to real interest preferences.

In the invention, the interest degree of the recommendation algorithm of the user is calculated by adopting a Bayesian average algorithm. The method is suitable for the data of the single-dimensional user behavior which can be visually judged to be favored by the user algorithm, such as the scoring of the algorithm recommendation result by the user. The algorithm can solve the problem of low confidence coefficient of a prediction result caused by less behavior data of a cold-start user, so that the accuracy of the interest prediction of the user algorithm is improved.

The above calculation is based on the assumption that the higher the user score, the higher the user's interest level in the algorithm, and obviously, this assumption is true. However, some information cannot be calculated using this method. Such as: in a clothing sale website, the preference of a user is greatly related to the age and the gender of the user. By not being able to determine intuitively which recommendation algorithms users of a certain age and gender are suitable for, the degree of interest in which recommendation algorithms is higher. Often, this information will play a critical role. Therefore, the following method is adopted for recommendation:

s230: and calculating the characteristic values of the multi-dimensional users by adopting a naive Bayes model, establishing a multi-dimensional user interest model, and training the multi-dimensional user interest model to obtain the recommendation algorithm interest of the multi-dimensional users.

Specifically, the naive Bayes model in machine learning is applied, the user interest model is trained through the historical data of the multidimensional user, and the recommendation algorithm interest degree of the user is calculated.

Bayesian probabilistic model general form:

⑤

in the above formula ⑤, C is the target event and X is the eigenvalue;

p (C | X) is the probability of C occurrence under X condition;

p (X | C) is the probability of X occurring under C;

p (C) probability of being C; p (X) is the probability of X.

Assuming that the age of the target user U is a, two recommendation algorithms A, B are provided in the recommendation process, the probability that the user U is interested in the algorithm a is P (a | a), and the probability that the user U is interested in the algorithm B is P (B | a), in the same way, the probability that the user U is interested in the algorithm a is P (a | a) obtained by formula ⑤:

⑥

it should be noted that the above formula ⑥ only concerns the numerator portion of the formula because the denominator is independent of C and the value of characteristic a is given, so the denominator can be considered a constant.

The prior probability p (a) is not a distribution function but is merely a value expressing the proportion of the number of samples of each class in the sample space. According to the theorem of majorities, when the number of samples in the training set is sufficient and comes from the random selection of the sample space, the value of P (A) can be estimated according to the proportion of various samples in the training set. Therefore, p (a) can be made equal to the average of the number of exposed recommendations of algorithm a, as follows:

⑦

andaverage scores for algorithm a and algorithm B, respectively;

p (a | A) is a probability density function distributed in a certain (such as normal distribution) form, and needs to be estimated from the distribution condition of sample characteristics in a training set; estimation methods can be divided into parametric estimation and non-parametric estimation. A common method is maximum likelihood estimation of the probability.

Here the age of the target user follows a normal distribution; therefore, P (a | a) represents the following formula:

⑧

where μ is the expectation of the age of the user interested in algorithm a;

σ²is the variance of the age of the user interested in algorithm a.

The expectation and variance of the age can be obtained by training the historical data of the website, and the interest level of the user U with the age a in the algorithm A can be calculated to be P (a | A) × P (A) according to the formula ⑦ and the formula ⑧.

The above is a unique calculation process of the feature variable, and more generally, the user information is a multidimensional feature variable. Then according to a naive bayes model:

⑨

p (F) in the above formula ⑨_iThe | C | calculation method is the same as the formula ⑧.

The former half of the formula ⑨ is a general form of naive Bayes, F1, …, Fn is a multi-dimensional user characteristic value, and the latter half is a formula expression without considering denominator.

From the above, the naive Bayesian model can synthesize all the dimension information to predict the recommendation algorithm tendency of the user under the condition of more user information. In the invention, hidden user information is trained and learned by adopting a naive Bayesian model of machine learning, and the information is usually multidimensional. The model can be used for mining the implicit recommendation algorithm interestingness of the user group and improving the interest prediction accuracy. The results calculated by the model may be fused or supplemented with bayesian averaging results. The naive Bayes model and the Bayes average algorithm have solid mathematical basis and theoretical model; and the Bayes average algorithm and the naive Bayes model are widely applied to the fields of machine learning, probability statistics, data mining and the like.

In the invention, the data in the database are divided into a single-dimensional user characteristic value and a multi-dimensional user characteristic value, and the selection of the characteristic value is determined by a specific service scene; and then determining which method is adopted to learn the characteristic value according to the selected characteristic value.

It should be noted that: the number of eigenvalues applicable to the naive bayes algorithm is not limited to multidimensional information. The single-dimensional characteristic value can also predict the interest degree of the user to the recommendation algorithm by using a naive Bayes model. The single-dimensional characteristic value here refers to a characteristic value of the user information, and it is more exact to say that: in a specific service scene, only a single-dimension characteristic value is provided, but the user has many other implicit information to be considered; therefore, in practical application, the method adopts a naive Bayes algorithm to calculate the data. In practical applications, the number of eigenvalues is typically greater than 1. One specific embodiment of the above-mentioned single-dimensional user feature value using the naive bayes algorithm is as follows:

when house renting is predicted, the house area can be used as a single characteristic value to predict the house renting price, and the larger the house area is, the higher the rent is. In actual practice, information such as the geographical position of a house, facilities of the house, and the number of rooms in the house is often calculated as a feature value in order to improve accuracy. That is, the geographical location of the house, the facilities of the house, the number of rooms of the house, and the like are implicit information necessary for predicting the tenancy in addition to the area of the house. The method for selecting the characteristic value is not limited to the selection method described in the embodiment of the present invention, and the characteristic value may be flexibly selected according to a specific service scenario.

The Bayesian average algorithm is different from a naive Bayes model adopting a single-dimensional characteristic value in that: the feature value of the bayesian averaging algorithm is generally the behavior data of the user or the feedback information of the user. The data is characterized in that cold start exists and is continuously increased, and the prediction result is not credible under the condition of insufficient data quantity, so that the data is corrected by adopting a Bayesian average algorithm, and the data is continuously close to real interest and preference along with the continuous increase of user behaviors. The characteristic value of the naive Bayes model is generally attribute information of the user; such as the age, gender, etc. of the user. Aiming at the information, the invention adopts naive Bayes in machine learning to establish a supervision model-user interest model, and predicts the interest degree of the user in the algorithm according to the trained model. By combining the step S220 and the step S230, the recommendation algorithm interestingness of the single-dimensional user is calculated by adopting a parallel calculation method in the process of calculating the characteristic value of the single-dimensional user by adopting the Bayesian average algorithm; and in the process of calculating the characteristic value of the multi-dimensional user by adopting the naive Bayes model, calculating the recommendation algorithm interestingness of the multi-dimensional user by adopting a parallel calculation method.

It should be noted that, in order to improve the performance of the personalized mixed recommendation technology, the invention proposes to calculate the algorithm interest tendency of the user by adopting a parallel computing framework. This has the advantage of being able to handle a large amount of system information. For most online systems, the number of users and the behavior information of users is increasing. The personalized interest modeling of the user is closely related to the magnitude of the user. If the number of the users is too large, the time for calculating the interest model of the users is exponentially increased, the algorithm interest model of the users cannot be updated in time, and the performance index of the recommendation system is directly influenced.

The parameter estimation and the prior probability therein require statistics on the user data in the system. The parameters can be obtained by using distributed parallel framework calculation.

Taking the parameter estimation in the naive Bayes algorithm as an example (formula ⑧), the expectation and variance of the user's interest level in the algorithm A can be realized by a parallel computing framework, which has the following formula:

the sum of the interestingness of each group of users is calculated in parallel, the expectation of the interestingness of all the users is calculated in the merging stage, and a distributed calculation process of the standard deviation can be obtained in the same way, wherein the formula is as follows:

and calculating the average sum of the interest and the expected difference of each user in parallel, and combining the results. The updating frequency of the user interest model of the high PV website can be ensured by adopting the distributed parallel computing interest model, so that the interest change of the user can be sensitively sensed.

According to the invention, the time bottleneck of model training and interest prediction evaluation can be solved through parallel computing, and the performance of the personalized mixed recommendation technology is improved; the commercial feasibility of the modeling method proposed by the Bayesian average algorithm and the naive Bayes model can also be verified by parallel computing the training model.

S240: and determining a recommendation list according to the recommendation algorithm interestingness of the single-dimensional user and the recommendation algorithm interestingness of the multi-dimensional user, and recommending the recommendation list to the user of the database.

Specifically, the recommendation algorithm interestingness of the single-dimensional user and the recommendation algorithm interestingness of the multi-dimensional user can be fused through the mixed recommendation strategy component, the interestingness of the user of the database to each recommendation algorithm is obtained, the proportion of each recommendation algorithm in all recommendation algorithms is obtained according to the interestingness of each recommendation algorithm, a recommendation list is determined, and the recommendation list is recommended to the user of the database.

The personalized mixed recommendation strategy component has the main functions of fusing the calculation results of the two interest models to obtain the interest degree of a user for each recommendation algorithm, and calculating the proportion of each recommendation algorithm in a recommendation list through the following formula, wherein the formula is as follows:

wherein, W_iAnd calculating the interest degree of the result and the naive Bayes model after calculating the result by fusing the Bayes average algorithm.

In addition, W is calculated_iThere are many methods exceptBesides a simple linear model, a regression model, an RBM and the like can also be adopted. Finally, the personalized hybrid recommendation policy component passes P_iAnd determining a recommendation result in the recommendation list, displaying the recommendation result to the end user, and finishing the recommendation.

In addition, it should be noted that the personalized mixed recommendation policy component fuses two calculation results, and how to perform the fusion is well-established, and those skilled in the art can understand that the detailed description is not provided herein.

In the invention, the recommendation algorithm interestingness of the user is obtained through the Bayes average algorithm and the naive Bayes model, so that a personalized mixed recommendation strategy can be provided for the user, the personalized degree of recommendation is enhanced, the accuracy of a recommendation result is improved, and a better algorithm recall rate is considered. Meanwhile, when recommending, the user interest change can be captured, and the recommendation strategy can be automatically adjusted without any human intervention. The situation of user experience degradation caused by online A/B test can be avoided. On the contrary, the algorithm interest model can understand the users more and more along with the enrichment of the user data; the recommendation result is more and more suitable for the interest points of the user, and the recommendation user satisfaction is improved. By the method, the actual effect of the recommendation algorithm can be counted very conveniently, so that a recommendation system developer can modify and optimize the recommendation algorithm in time to form a virtuous circle.

Corresponding to the method, the invention further provides a hybrid recommendation system, and fig. 3 shows a logic structure of the hybrid recommendation system according to an embodiment of the invention.

As shown in fig. 3, the hybrid recommendation system 300 provided by the present invention includes a database distinguishing unit 310, a bayesian average algorithm calculating unit 320, a naive bayes model calculating unit 330, and an algorithm recommending unit 340.

The database distinguishing unit 310 is configured to divide the user data in the database into a single-dimensional user characteristic value and a multi-dimensional user characteristic value.

The bayesian average algorithm calculating unit 320 is configured to calculate the feature value of the one-dimensional user by using a bayesian average algorithm to obtain the recommendation algorithm interestingness of the one-dimensional user.

The naive bayes model calculating unit 330 is configured to calculate the multi-dimensional user feature values by using a naive bayes model, establish a multi-dimensional user interest model, train the multi-dimensional user interest model, and obtain the recommendation algorithm interest of the multi-dimensional user.

The algorithm recommending unit 340 is configured to determine a recommendation list according to the recommendation algorithm interestingness of the one-dimensional user and the recommendation algorithm interestingness of the multi-dimensional user, and recommend the recommendation list to the user of the database.

The algorithm recommending unit 340 further includes an interestingness obtaining unit 341 and a recommendation list determining unit 342.

The interestingness obtaining unit 341 is configured to fuse the recommendation algorithm interestingness of the one-dimensional user and the recommendation algorithm interestingness of the multi-dimensional user through a hybrid recommendation policy component, and obtain the interestingness of each recommendation algorithm of the user in the database.

The recommendation list determining unit 342 is configured to obtain a proportion of each recommendation algorithm in all recommendation algorithms according to the interestingness of each recommendation algorithm, and determine a recommendation list.

The Bayesian average algorithm calculating unit 320 calculates the recommendation algorithm interestingness of the single-dimensional user by adopting a parallel calculation method in the process of calculating the characteristic value of the single-dimensional user; and in the process of calculating the characteristic values of the multi-dimensional users, the naive Bayes model calculating unit calculates the interestingness of the recommendation algorithm of the multi-dimensional users by adopting a parallel calculation method.

Wherein, the Bayesian average algorithm calculating unit 330 calculates the characteristic value of the one-dimensional user by using the Bayesian average algorithm,

the Bayesian average algorithm formula is as follows:

wherein, C is a self-set constant;

m is the average of the items in the database;

n is the data quantity of the project;

x is the value of each datum;

in the above formula, m is a prior probability, each value of x is an adjustment factor, and as n increases, the value of the itemWill continually trend towards its correct value.

Wherein, in the process of calculating the characteristic value of the multi-dimensional user by adopting the naive Bayes model,

the naive Bayes model formula is as follows:

wherein, C is a target event; x is a characteristic value;

p (C | X) is the probability of C occurrence under X condition;

p (X | C) is the probability of X occurring under C;

p (C) probability of being C;

p (X) is the probability of X.

According to the embodiment, the mixed recommendation method and the mixed recommendation system provided by the invention can calculate and evaluate the recommendation algorithm interestingness of different users, and the recommendation algorithm interestingness of the users is obtained by adopting a Bayes average algorithm and a naive Bayes model, so that the algorithm interest prediction accuracy of the users is improved; the time for training the user interest model and the time for estimating the interest prediction can be shortened through parallel calculation.

The hybrid recommendation method and system proposed according to the present invention are described above by way of example with reference to the accompanying drawings. However, it should be understood by those skilled in the art that various modifications can be made to the hybrid recommendation method and system of the present invention without departing from the scope of the present invention. Therefore, the scope of the present invention should be determined by the contents of the appended claims.

Claims

1. A hybrid recommendation method, comprising:

dividing user data in a database into a single-dimensional user characteristic value and a multi-dimensional user characteristic value, wherein the single-dimensional user characteristic value is user information which can visually judge the user interest degree in a recommendation algorithm, and the multi-dimensional user characteristic value is multi-dimensional, invisible user information which cannot visually judge the user interest degree in the recommendation algorithm;

calculating the characteristic value of the single-dimensional user by adopting a Bayesian average algorithm to obtain the recommendation algorithm interest of the single-dimensional user, wherein the recommendation algorithm interest of the single-dimensional user is calculated by adopting a parallel calculation method; and the number of the first and second groups,

calculating the characteristic values of the multi-dimensional users by adopting a naive Bayes model, establishing a multi-dimensional user interest model, training the multi-dimensional user interest model, and obtaining recommendation algorithm interest degrees of the multi-dimensional users, wherein the recommendation algorithm interest degrees of the multi-dimensional users are calculated by adopting a parallel calculation method;

2. The hybrid recommendation method of claim 1, wherein, in determining the recommendation list according to the recommendation algorithm interestingness of the one-dimensional user and the recommendation algorithm interestingness of the multi-dimensional user,

fusing the recommendation algorithm interestingness of the single-dimensional user and the recommendation algorithm interestingness of the multi-dimensional user through a mixed recommendation strategy component to obtain the interestingness of the user of the database to each recommendation algorithm;

and obtaining the proportion of each recommendation algorithm in all recommendation algorithms according to the interestingness of each recommendation algorithm, and determining a recommendation list.

3. The hybrid recommendation method of claim 1, in calculating the single-dimensional user feature value using a Bayesian average algorithm,

the Bayesian average algorithm formula is as follows:

<mrow> <mover> <mi>X</mi> <mo>&OverBar;</mo> </mover> <mo>=</mo> <mfrac> <mrow> <mi>C</mi> <mo>&times;</mo> <mi>m</mi> <mo>+</mo> <msubsup> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </msubsup> <msub> <mi>x</mi> <mi>i</mi> </msub> </mrow> <mrow> <mi>n</mi> <mo>+</mo> <mi>C</mi> </mrow> </mfrac> </mrow>

wherein, C is a self-set constant;

m is the average of the items in the database;

n is the data quantity of the project;

x is the value of each datum;

4. The hybrid recommendation method of claim 1, in calculating the multi-dimensional user feature values using a naive Bayes model,

the naive Bayes model formula is as follows:

<mrow> <mi>P</mi> <mrow> <mo>(</mo> <mi>C</mi> <mo>|</mo> <mi>X</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mi>P</mi> <mrow> <mo>(</mo> <mi>C</mi> <mo>|</mo> <mi>X</mi> <mo>)</mo> </mrow> <mo>&times;</mo> <mi>P</mi> <mrow> <mo>(</mo> <mi>X</mi> <mo>)</mo> </mrow> </mrow> <mrow> <mi>P</mi> <mrow> <mo>(</mo> <mi>X</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow>

wherein, C is a target event; x is a characteristic value;

p (C | X) is the probability of C occurrence under X condition;

p (X | C) is the probability of X occurring under C;

p (C) probability of being C;

p (X) is the probability of X.

5. A hybrid recommendation system comprising:

the system comprises a database distinguishing unit, a recommendation algorithm judging unit and a recommendation algorithm judging unit, wherein the database distinguishing unit is used for dividing user data in a database into a single-dimensional user characteristic value and a multi-dimensional user characteristic value, the single-dimensional user characteristic value is user information which can visually judge the user interest degree in the recommendation algorithm, and the multi-dimensional user characteristic value is multi-dimensional, invisible user information which cannot visually judge the user interest degree in the recommendation algorithm;

the Bayesian average algorithm calculating unit is used for calculating the characteristic value of the one-dimensional user by adopting a Bayesian average algorithm to obtain the recommendation algorithm interest of the one-dimensional user, wherein the recommendation algorithm interest of the one-dimensional user is calculated by adopting a parallel calculating method;

the naive Bayes model calculating unit is used for calculating the characteristic values of the multi-dimensional users by adopting a naive Bayes model, establishing a multi-dimensional user interest model, training the multi-dimensional user interest model and obtaining recommendation algorithm interest degrees of the multi-dimensional users, wherein the recommendation algorithm interest degrees of the multi-dimensional users are calculated by adopting a parallel calculating method;

6. The hybrid recommendation system of claim 5, wherein the algorithmic recommendation unit further comprises:

the interestingness obtaining unit is used for fusing the interestingness of the recommendation algorithm of the single-dimensional user and the interestingness of the recommendation algorithm of the multi-dimensional user through a mixed recommendation strategy component to obtain the interestingness of the user of the database to each recommendation algorithm;

and the recommendation list determining unit is used for obtaining the proportion of each recommendation algorithm in all recommendation algorithms according to the interestingness of each recommendation algorithm and determining the recommendation list.

7. The hybrid recommendation system of claim 5, wherein,

in the process that the Bayesian average algorithm calculating unit adopts the Bayesian average algorithm to calculate the single-dimension user characteristic value,

the Bayesian average algorithm formula is as follows:

wherein, C is a self-set constant;

m is the average of the items in the database;

n is the data quantity of the project;

x is the value of each datum;

8. The hybrid recommendation system of claim 5, wherein,

the naive Bayes model calculating unit calculates the multi-dimensional user characteristic value by adopting the naive Bayes model,

the naive Bayes model formula is as follows:

wherein, C is a target event; x is a characteristic value;

p (C | X) is the probability of C occurrence under X condition;

p (X | C) is the probability of X occurring under C;

p (C) probability of being C;

p (X) is the probability of X.