CN109840833B

CN109840833B - Bayesian collaborative filtering recommendation method

Info

Publication number: CN109840833B
Application number: CN201910112719.7A
Authority: CN
Inventors: 王邦军; 戴欣; 李凡长; 张莉
Original assignee: Suzhou University
Current assignee: Weihai Bohua Medical Equipment Co ltd
Priority date: 2019-02-13
Filing date: 2019-02-13
Publication date: 2020-11-10
Anticipated expiration: 2039-02-13
Also published as: CN109840833A

Abstract

The invention discloses a Bayesian collaborative filtering recommendation method. The invention discloses a Bayesian collaborative filtering recommendation method, which comprises the following steps: the input of the model is a scoring matrix of the collaborative filtering recommendation system

Decomposition into two potential matrices

Wherein for M K matrix U_ikRepresenting the probability, U, that user i belongs to group k_ikE (0, 1); for an N K matrix V_jkEvidence that user group k likes item j, i.e. prediction score matrix R ═ UV^T(ii) a Since the data set R is sparse, the observed entries can be represented by the set Ω { (i, j) | R_ijis observed }; a probabilistic approach is taken to this problem; representing a likelihood function for the observed data and treating the potential matrix as a random variable; when it is assumed that each value of R is from the product of U and V, some Gaussian noise is added

The invention has the beneficial effects that: the user's taste is various, can not reflect the taste comparatively unanimously like little data set. A large amount of data are missing in the real data set, and if the evidence is insufficient and the values are difficult to predict, the values are predicted to be median values or average values, so that the recommendation significance is lost.

Description

Bayesian collaborative filtering recommendation method

Technical Field

The invention relates to the field of Internet, in particular to a Bayesian collaborative filtering recommendation method.

Background

The networking is appeared and popularized, a large amount of data can be easily obtained, but the large amount of data makes it difficult for users to directly obtain effective information in searching information, so that the use rate of the information is reduced. Therefore, a method for effectively solving the information overload problem using a recommendation system is very important, and the recommendation system will recommend contents according to the user's request, hobbies, and the like. Currently, recommendation systems have found wide application in many fields such as movies, music, shopping, social interactions, books, etc. The application is the most extensive, and one of the most effective personalized recommendation technologies is a collaborative filtering recommendation algorithm. Collaborative filtering is mainly divided into two categories, memory-based methods and model-based methods:

1. the memory-based method comprises the following steps: the method is mainly divided into user-based collaborative filtering and project-based collaborative filtering, and generally predicts by similar users (or similar projects) of a target user (or target project).

2. Model-based methods: the model is used for predicting the user score, namely model training parameters are firstly constructed, and once training is finished, the model-based recommendation system can predict the preference of the user very quickly. Therefore, when a large number of users and a large number of items exist, the model-based recommendation is strong in expandability and high in prediction speed. The model-based method mainly comprises the following steps: decision trees, rule-based models, bayesian methods and potential factor models.

Although reliable suggestions are provided by using a memory-based method, for keyword information which is not contacted by a user at all and new user information which cannot be recommended, a large number of ratings are needed to make reliable predictions, and real-world data is often very sparse. For this reason, many scholars have made more intensive research on model-based recommendation systems, and have proposed some improved methods and achieved some achievements, for example, incremental recommendation algorithm based on Probabilistic Latent Semantic Analysis (PLSA) for automatic question recommendation is used in question and answer websites, so that the problem of insufficient real-time performance of the recommendation system is improved; a recommendation algorithm based on a document topic generation model (LDA) is used in the blog, so that the recommendation precision of a user is improved; a recommendation algorithm combining a clustering algorithm and an SVD algorithm is used in an e-commerce recommendation system, so that the problem of data sparsity and the like is effectively solved.

The traditional technology has the following technical problems:

in existing models, matrix scoresThe solution technique has higher accuracy on sparse matrices. In the classical matrix decomposition model, the scores of the user set on the item set are expressed as a score matrix R_m×nWherein r is_ijRepresenting user u_iFor article v_jThe matrix is decomposed into two matrices: a U associated with a user_m×kAnd another associated with the item V_k×nSo that their product approximates the original matrix: r_m×n≈U_m×k×V_k×n. The core idea is to connect users and items through implicit characteristics, and fill missing items by using a dimension reduction method, which has proved superior to the traditional nearest neighbor technology in the recommendation algorithm. In the method, each item is not classified into a category rigidly, but the weight of the item in each category is determined by counting user behaviors, and if users who like a certain category all like a certain item, the weight of the item in the category is possibly higher. The classical matrix decomposition is applied to collaborative filtering, and the problem of excessive sparsity of data is well solved.

Classical matrix factorization has two major drawbacks, one of which is that the matrix components of the decomposition are not constrained to be non-negative, which results in difficulty in understanding the predictive meaning of each component, and the recommendation system has no interpretability. Another disadvantage is that the general matrix decomposition is non-probabilistic, the solution being to minimize the error between the original matrix and the approximated matrix, i.e.:

this approach may more easily lead to overfitting and neglecting its uncertainty, with poor recommendations.

There are some inherent disadvantages of model-based recommendation systems in that when new items or items are rarely rated, it is difficult for a single model approach to provide sufficient evidence, resulting in a great impact on recommendation quality.

Disclosure of Invention

The invention aims to solve the technical problem of providing a Bayesian collaborative filtering recommendation method, which is difficult to explain the semantics of a negative number and a prediction result in a decomposition matrix, and solves the problem well by non-negative matrix decomposition, wherein the non-negative matrix decomposition restricts elements to be non-negative and can decompose a variable data set into a meaningful non-negative matrix. A Bayesian probability method is incorporated into nonnegative matrix decomposition, and the method is different in that two smaller matrixes are regarded as random variables, prior distribution is placed on the random variables, and posterior distribution of values of the prior distribution is found by observing data, so that overfitting can be greatly reduced, and convergence time is saved. Therefore, the hidden incidence relation among scoring data, users and projects in the scoring matrix is utilized, a nonnegative matrix decomposition algorithm based on a variational Bayesian probability model is combined with improved naive Bayesian classification, and a hidden Bayesian probability model recommendation algorithm is provided. And (3) obtaining a hidden user group from variational Bayes nonnegative matrix decomposition (BNMF), and carrying out initial prediction on a missing value, and further correcting by using improved naive Bayes on the basis to generate a recommendation result. The method fully considers multiple hidden relations between users, between users and projects and between projects, solves the problem of cold start of commodities and improves the accuracy of prediction results. Experimental results show that the recommendation quality is obviously improved by the algorithm.

In order to solve the technical problem, the invention provides a Bayesian collaborative filtering recommendation method, which comprises the following steps:

the input of the model is a scoring matrix of the collaborative filtering recommendation system

Decomposition into two potential matrices

Wherein for M K matrix U_ikRepresenting the probability, U, that user i belongs to group k_ikE (0, 1); for an N K matrix V_jkEvidence that user group k likes item j, i.e. prediction score matrix R ═ UV^T(ii) a Since the data set R is sparse, the observed entries can be represented by the set Ω { (i, j) | R_ijis observed }; a probabilistic approach is taken to this problem; representing a likelihood function for the observed data and forming a potential matrixProcessing for random variables; when it is assumed that each value of R is from the product of U and V, some Gaussian noise is added

Namely:

R＝UV^T+E,

wherein U is_i,V_jI and j rows, R, representing U and V_ijObeying a gaussian distribution with a precision τ; the parameter set of our model is denoted θ ═ { U, V, τ };

according to Bayesian theorem, the observed dataset D ═ R_ij}_i,j∈ΩAs a priori, a distribution is then found for the parameter θ:

P(θ|D)∝P(D|θ)P(θ),

the posterior P (θ | D) is usually not calculated accurately, but a good approximation can be obtained by choosing a suitable a priori; in order to make the decomposed matrix values have interpretable meanings, U, V are constrained to be non-negative; the users and the users, and the commodities are independent from each other, so the indexes are selected from U and V in advance, so that each element in U and V is assumed to be independent index distribution and the speed parameter

Can be constrained to be non-negative at the same time; namely:

using α for precision τ_τ,β_τGamma distribution > 0, i.e.:

approximating the posterior P (theta | D) by an approximation q (theta) in variational Bayes; according to the mean field principleIt is assumed that the variational distribution q (θ) is fully true, and therefore all variables are independent in the a posteriori, i.e.:

the following distribution was obtained using bayes' theorem:

wherein

Approximation function q (theta)_i) Obey the following distribution:

by minimizing the KL divergence, the approximation function q (θ) is approximated to the posterior P (θ | D):

to minimize KL divergence, only the lower evidence bound (ELBQ) L (q) needs to be maximized, so that an approximate solution of the posterior p (θ | D) can be obtained; i.e. it can be found (for a certain constant C) that the ith q is^*(θ_i) Then sequentially updates other theta_iAnd finally, mutual iteration is stable, so that the optimal update of the variation parameters can be found, and the algorithm ensures the maximization (ELBO) of the lower bound of the evidence:

adding an auto-correlation determination (ARD) method, without selecting the correct k, but giving an upper bound, the model will automatically determine the number of factors to use; each parameter of the prior of the decomposition matrix is replaced by one shared by all items in the same column, namely each factor is shared, and the prior of the decomposition matrix is divided into a plurality of items at lambda_kPlacing a gamma prior; the prior distribution becomes:

naive bayes classification:

assuming D is a sample data set, n attributes A for each sample X in D₁,A₂,…A_nExpressed as X ═ X with n-dimensional feature vectors₁,x₂,…,x_n](ii) a Suppose a sample has m classes (e.g. score full 5, i.e. there are 5 classes), each class is respectively represented by C₁,C₂,…C_mRepresents;

according to Bayes' theorem

For a sample X to be classified, the respective class C in D can be derived under the condition that X appears_iThe probability of occurrence; comparing the posterior probability of class occurrence, selectingThe category in which the probability is greatest; since p (X) is constant for all classes, if and only if the prior probability p (X | C)_i)p(C_i) At maximum, the posterior probability p (C)_i| X) max; in order to reduce the overhead and realize effective estimation, the classes and attributes are assumed to be independent from each other, that is, only the following are considered:

suppose that

Representing C in the training set D_iAnd (3) the class prior probability can be obtained through the collection of class samples:

for discrete attributes, assume

To represent

In A_kAttribute value of x_kThe conditional probability is then:

for the continuous attribute, a probability density function can be considered, and the continuous attribute is discretized;

corresponding influence factors are adopted for different users and attributes according to the importance, and the weighted naive Bayes model is improved:

where ρ is_iRepresenting user u_iWeight of (a), ω_kRepresents attribute A_kThe weight of (c); the weight value is larger, namely the influence is larger, and the weight value is calculated by using the information entropy;

"hidden" in HBPM is embodied in a hidden user group K obtained from the U matrix in BNMF; multiplying the U matrix V matrix to obtain a prediction scoring matrix, and obtaining a part of hidden but reliable prediction scoring from the prediction scoring matrix; and finally, correcting by using improved naive Bayes in combination with the attributes to obtain a final prediction result.

A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of any of the methods when executing the program.

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of any of the methods.

A processor for running a program, wherein the program when running performs any of the methods.

The invention has the beneficial effects that:

in reality, the user's hobby is various, can not reflect the taste comparatively unanimously like little data set. A large amount of data are missing in the real data set, and if the evidence is insufficient and the values are difficult to predict, the values are predicted to be median values or average values, so that the recommendation significance is lost. Tests have been conducted to find that for items in the data set that have never been scored by the user or for which very few users have scored, the matrix factorization method does not find enough evidence to predict a preference or dislike, which can lead to cold start problems for the goods. Therefore, potential relations and apparent relations among the user commodities are fully considered, the commodity attribute is added to be combined with the scoring matrix, and the prediction scoring matrix is corrected to a certain extent, so that time is saved, and accuracy of a prediction result is improved.

Drawings

FIG. 1 is a schematic diagram of a BNMF probability model in the Bayesian collaborative filtering recommendation method of the present invention.

FIG. 2 is a schematic diagram of an HBPM probability model in the Bayesian collaborative filtering recommendation method of the invention.

Detailed Description

The present invention is further described below in conjunction with the following figures and specific examples so that those skilled in the art may better understand the present invention and practice it, but the examples are not intended to limit the present invention.

Our model is mainly composed of two parts. The first part mainly acquires hidden information through BNMF, and the second part combines the hidden information and the explicit information and uses an improved naive Bayes classifier.

Variational Bayesian nonnegative matrix factorization

Decomposition into two potential matrices

Wherein for M K matrix U_ikRepresenting the probability, U, that user i belongs to group k_ikE (0, 1); for an N K matrix V_jkEvidence that user group k likes item j, i.e. prediction score matrix R ═ UV^T. Since the data set R is sparse, the observed entries can be represented by the set Ω { (i, j) | R_ijis updated }. We take a probabilistic approach to this problem. We represent a likelihood function for the observed data and treat the latent matrix as a random variable. When we assume that each value of R comes from the product of U and V, some Gaussian noise is added

Namely:

R＝UV^T+E,

wherein U is_i,V_jI and j rows, R, representing U and V_ijObeying a gaussian distribution with an accuracy of τ. The parameter set of our model is denoted as θ ═ { U, V, τ }.

P(θ|D)∝P(D|θ)P(θ),

the a posteriori P (θ | D) is usually not calculated accurately, but a good approximation can be obtained by choosing a suitable a priori. In order to make the decomposed matrix values interpretable, U, V are constrained to be non-negative. The users and the users, and the commodities are independent from each other, so the indexes are selected from U and V in advance, so that each element in U and V is assumed to be independent index distribution and the speed parameter

And can also be constrained to be non-negative. Namely:

for precision τ we use α_τ,β_τGamma distribution > 0, i.e.:

the posterior P (theta | D) is approximated by an approximation q (theta) in variational Bayes. From mean field theory, we assume that the variational distribution q (θ) is fully true, so all variables are independent in the posteriori, i.e.:

using bayes' theorem we have the following distribution:

wherein

Approximation function q (theta)_i) Obey the following distribution:

to minimize KL divergence, only the lower evidence bound (ELBQ) L (q) needs to be maximized, so that an approximate solution for the posterior p (θ | D) can be obtained. I.e. it can be found (for a certain constant C) that the ith q is^*(θ_i) Then sequentially updates other theta_iFinally, the mutual iteration is stabilized, so that variation parameters can be foundOptimal update of numbers, this algorithm guarantees maximization of the lower bound of Evidence (ELBO):

the selection of potential factors K in matrix decomposition also has great influence on the prediction of the result, and an automatic correlation determination (ARD) method is added, so that the model automatically determines the number of the factors to be used without selecting the correct K but giving an upper limit. Each parameter of the prior of the decomposition matrix is replaced by one shared by all items in the same column, namely each factor is shared, and the prior of the decomposition matrix is divided into a plurality of items at lambda_kA gamma prior is placed on it. The prior distribution becomes:

as shown in fig. 1, the probability model for BNMF is:

naive Bayes classification

Assuming D is a sample data set, n attributes A for each sample X in D₁,A₂,…A_nExpressed as X ═ X with n-dimensional feature vectors₁,x₂,…,x_n]. Suppose a sample has m classes (e.g. score full 5, i.e. there are 5 classes), each class is respectively represented by C₁,C₂,…C_mAnd (4) showing.

According to Bayes' theorem

For a sample X to be classified, the respective class C in D can be derived under the condition that X appears_iThe probability of occurrence. Comparing the posterior probabilities of the occurrence of the categories, and selecting the category with the highest probability. Since p (X) is constant for all classes, if and only if the prior probability p (X | C)_i)p(C_i) At maximum, the posterior probability p (C)_i| X) is largest. In order to reduce the overhead and realize effective estimation, the classes and attributes are assumed to be independent from each other, that is, only the following are considered:

suppose that

for discrete attributes, assume

To represent

In A_kAttribute value of x_kThe conditional probability is then:

for continuous attributes, the continuous attributes may be discretized in view of a probability density function.

If the classification prediction is carried out by directly using the naive Bayes, the effect is not ideal. The first original data is too sparse, and reliable data is insufficient; the second data has large calculation amount and high memory requirement; the third data is too noisy and not robust. Ordinary naive Bayes considers that all data and condition attributes have the same influence on classification, actually the preference of other people or the preference of similar users has lower importance on classification than the preference of the user, and the influence of different user attributes and item attributes on classification is different. In order to reduce the influence of different users and attributes on classification, corresponding influence factors can be adopted for different users and attributes according to importance, and improvement is carried out on a weighted naive Bayes model:

where ρ is_iRepresenting user u_iWeight of (a), ω_kRepresents attribute A_kThe weight of (c). The larger the weight value of the weighted value is, the larger the influence is, and the weighted value is calculated by using the information entropy.

Our model HBPM as shown in fig. 2, the "hidden" in HBPM is embodied in a hidden user group K obtained from the U matrix in BNMF; the U matrix V matrix is multiplied to obtain a prediction score matrix from which a portion of the hidden but reliable prediction scores are obtained. And finally, correcting by using improved naive Bayes in combination with the attributes to obtain a final prediction result. The algorithm flow for HBPM is shown in algorithm 1.

A specific application scenario of the present invention is described below:

we present our model in a simple and intuitive way. As shown in table 1, is a small dataset rating matrix with 11 users and 15 items, M11 and N15. The numbers indicate the user's rating of the item, with higher ratings indicating a greater preference, and prime' indicating missing data that has not been scored. From the figure we can clearly see that the user group with the same preference is { U }₁、U₂、U₃、U₁₀}、{U₄、U₅、U₆、U₁₁}、{U₇、U₈、U₉}. From the rating matrix, we can visually observe the user U₅May like item I₈But cannot directly observe the user U₅To I₄The attitude of (c). Table 2 shows the U matrix (with potential factor k being 3) decomposed by the BNMF method, which has certain interpretable significance, and each item U of the matrix_ikThe evidence that user i belongs to user group k is shown, and the user groups can be divided into 3 types through decomposed sub-matrixes: { U₁、U₂、U₃、U₁₀}、{U₄、U₅、U₆、U₁₁}、{U₇、U₈、U₉I.e. using sublotsHidden factors in the array are clustered, and users with similar preferences are divided into a group.

TABLE 1 user rating matrix

Table 3 shows the V matrix decomposed by the BNMF method, each term V of the matrix_jkShowing evidence that the users in the kth group like the respective items, it can be seen that the users in the first group like the item { I }₁，I₂，I₃，I₄，I₅}, users in group 2 like item { I₆，I₇，I₈，I₉，I₁₀}, users in group 3 like { I₇，I₁₁，I₁₂，I₁₃，I₁₄，I₁₅}。

TABLE 2U matrix after BNMF decomposition

	F₁	F₂	F₃	cluster
					U₁	0.82	0.07	0.06	1
U₂	0.80	0.09	0.17	1
					U₃	0.72	0.18	0.35	1
U₄	0.11	0.84	0.08	2
					U₅	0.26	0.76	0.23	2
U₆	0.10	0.82	0.18	2
					U₇	0.44	0.11	0.67	3
U₈	0.10	0.11	0.72	3
					U₉	0.32	0.36	0.60	3
U₁₀	0.75	0.12	0.35	1
					U₁₁	0.09	0.81	0.21	2

TABLE 3 BNMF decomposed V-matrices

	I₁	I₂	I₃	I₄	I₅	I₆	I₇	I₈	I₉	I₁₀	I₁₁	I₁₂	I₁₃	I₁₄	I₁₅
																G₁	5.61	6.04	5.73	5.58	5.82	0.64	1.77	0.73	0.59	4.50	0.74	0.96	1.35	1.39	0.91
G₂	2.31	0.73	0.71	2.26	0.75	5.78	4.80	5.58	5.86	4.90	2.27	0.70	4.21	2.30	2.31
																G₃	1.57	0.84	1.86	1.70	1.43	1.04	4.89	1.72	0.73	2.21	6.40	5.82	5.68	5.59	6.31

Compared with the prediction result of the classical matrix decomposition, the prediction method not only restricts the data to be non-negative to make the prediction more reasonable, but also well predicts most missing data (the bold place is known data), such as U in the classical matrix decomposition₉For item { I₁，I₂，I₃，I₄，I₅All show a comparative like, but there is not enough evidence to show that these items are like, whereas BNMF prediction of 3 points (middle value) is relatively more reasonable. But if the mean value or the median value is assigned to the items with insufficient evidence in the recommendation system in order to improve the MAE, the meaning of the recommendation is lost. On the basis of the above, the user wants to align U₉For item { I₁，I₂，I₃，I₄，I₅The preferences of the project are predicted more accurately, requiring the potential connections between the projects to be seen. The distinction and connection of likes and dislikes in the scored item, extended to likes or dislikes of the attribute, is observed by U9 in conjunction with the item attribute. Here, the user's preference for items is predicted more accurately using an improved naive bayesian classification.

Our experiments were also performed on real data sets. We used Movielens 100k and Movielens 1M as experimental datasets, both of which were user ratings of projects by rating the movies (score range: an integer of 1-5), with at least 20 movies rated by each registered user. The data set also provides a user profile including gender, age, occupation, zip code; a movie property file, i.e. the type of movie, a movie has at least one property and at most 6 properties. In the experiment, a user project score file and a movie attribute file are used for evaluating the technology, the experiment data set is divided into two groups, wherein 80% of the experiment data set is used as a training set (HBPM is used), 20% of the experiment data set is used as a test set, each group of experiments are repeated for 30 times, and the average value of each performance index is used for detecting the accuracy of a prediction result.

For prediction accuracy, the Mean Absolute Error (MAE) and the Root Mean Square Error (RMSE) are used as performance indexes for evaluating the prediction accuracy of the user, namely the prediction accuracy is measured by calculating the deviation between the user score predicted by the model and the actual score. Test set actual user score of T_ijCorresponding predicted user score of P_ijAnd n is the size of the test set.

The recommended accuracy on the test set is verified. When predicted item score P_ijAnd when the number is more than or equal to 4, the recommendation can be considered to be recommended to the user. P_R＝{i,j|P_ij≧ 4} represents a set, T, that can be recommended to the user in the prediction result_R＝{i,j|T_ij≧ 4} represents a set of user preferences in the test set, P_d＝{i,j|P_ij3} represents a non-recommendable set of predictions, T_d＝{i,j|T_ij≦ 3} non-recommendable set in the test set.

In table 4, we compared our experiments with other methods. Different similarity measures have a large impact on the results of the K-nearest neighbor recommendation system (KNN), where pearson correlation coefficients are used. It can be seen that the BNMF algorithm is significantly improved in MAE and accuracy compared to the non-negative matrix factorization recommendation system (BNMF) and the Classical matrix factorization recommendation system (classic MF) using only KNN of the scoring information. A multi-level hybrid similarity recommendation system (usicf), an improved na iotave bayes recommendation system (INB-CF), and a hybrid multi-tag recommendation system (DRA-HMLF), all add user information or item information based on usage scoring information. Usicf improves the similarity algorithm and uses the movie attributes to predict user interests, which, while more effective than KNN, ignores potential connections between users. The INB-CF algorithm utilizes the user and movie attributes in combination with naive Bayesian classification, but the classification effect is poor due to the fact that data is too sparse and evidence is insufficient. The DRA-HMLF algorithm combines a similarity algorithm and matrix decomposition, and only carries out clustering through user attributes but ignores user scoring behaviors. According to the method, variational Bayes is integrated to improve classical matrix decomposition, so that initial prediction of a scoring matrix is more stable and accurate, and commodity information is integrated to parts which cannot be accurately predicted due to insufficient evidence in matrix decomposition, so that recommendation accuracy is improved.

Table 4 comparison of our experiments with other methods

The above-mentioned embodiments are merely preferred embodiments for fully illustrating the present invention, and the scope of the present invention is not limited thereto. The equivalent substitution or change made by the technical personnel in the technical field on the basis of the invention is all within the protection scope of the invention. The protection scope of the invention is subject to the claims.

Claims

1. A Bayesian collaborative filtering recommendation method is characterized by comprising the following steps:

Decomposition into two potential matrices

Wherein for M K matrix U_ikRepresenting the probability, U, that user i belongs to group k_ikE (0, 1); for an N K matrix V_jkIndicating that user group k likes item jEvidence, i.e. prediction scoring matrix R ═ UV^T(ii) a Since the data set R is sparse, the observed entries can be represented by the set Ω { (i, j) | R_ijis observed }; a probabilistic approach is taken to this problem; representing a likelihood function for the observed data and treating the potential matrix as a random variable; when it is assumed that each value of R is from the product of U and V, some Gaussian noise is added

Namely:

R＝UV^T+E,

P(θ|D)∝P(D|θ)P(θ),

Can be constrained to be non-negative at the same time; namely:

for precision τBy using alpha_τ,β_τGamma distribution > 0, i.e.:

approximating the posterior P (theta | D) by an approximation q (theta) in variational Bayes; according to mean field theory, it is assumed that the variational distribution q (θ) holds exactly, so all variables are independent in the a posteriori, i.e.:

the following distribution was obtained using bayes' theorem:

wherein

Approximation function q (theta)_i) Obey the following distribution:

to minimize KL divergence, only the lower evidence bound L (q) needs to be maximized, so that an approximate solution for a posteriori p (θ | D) can be obtained; i.e. the ith q can be found^*(θ_i) Then sequentially updates other theta_iAnd finally, mutual iteration is stable, so that the optimal update of the variation parameters can be found, and the algorithm ensures the maximization of the lower bound of the evidence:

adding an automatic relevance determination method, without selecting correct k, giving an upper limit, and automatically determining the number of factors to be used by a model; each parameter of the prior of the decomposition matrix is replaced by one shared by all items in the same column, namely each factor is shared, and the prior of the decomposition matrix is divided into a plurality of items at lambda_kPlacing a gamma prior; the prior distribution becomes:

U_ik～(U_ik|λ_k)V_ik～(V_jk|λ_k)

naive bayes classification:

assuming D is a sample data set, n attributes A for each sample X in D₁,A₂,…A_nExpressed as X ═ X with n-dimensional feature vectors₁,x₂,…,x_n](ii) a Suppose a sample has m classes, each class is represented by C₁,C₂,…C_mRepresents;

according to Bayes' theorem

For a sample X to be classified, the respective class C in D can be derived under the condition that X appears_iThe probability of occurrence; comparing the posterior probabilities of the occurrence of the categories, and selecting the category with the highest probability; since p (X) is constant for all classes, if and only if the prior probability p (X | C)_i)p(C_i) At maximum, the posterior probability p (C)_i| X) max; in order to reduce the overhead and realize effective estimation, the classes and attributes are assumed to be independent from each other, that is, only the following are considered:

suppose that

for discrete attributes, assume

To represent

In A_kAttribute value of x_kThe conditional probability is then:

2. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the method of claim 1 are performed when the program is executed by the processor.

3. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method as claimed in claim 1.

4. A processor, characterized in that the processor is configured to run a program, wherein the program when running performs the method of claim 1.