CN103106535B

CN103106535B - Method for solving collaborative filtering recommendation data sparsity based on neural network

Info

Publication number: CN103106535B
Application number: CN201310055267.6A
Authority: CN
Inventors: 孙健; 王晓丽; 徐杰; 隆克平; 张毅; 梁雪芬; 李乾坤; 姚洪哲; 陈旭; 陈小英
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2013-02-21
Filing date: 2013-02-21
Publication date: 2015-05-13
Anticipated expiration: 2033-02-21
Also published as: CN103106535A

Abstract

The invention provides a method for solving collaborative filtering recommendation data sparsity based on a neural network. The method for solving collaborative filtering recommendation data sparsity based on the neural network adopts generalized regression of neural network (GRNN) and conducts full filling on sparse data by a train network model and score prediction. The method for solving collaborative filtering recommendation data sparsity based on the neural network comprises the following steps: before conducting the GRNN training, conducting screening on input variables of the neural network by adopting mean impact value (MIV), choosing characteristic values having great impact on output as valid input variables; using the valid input variable to construct the input matrix of the GRNN; adopting Kfold cross validation circulation to find out an optimal spread value of the GRNN; using the optimal spread value and the corresponding input matrix and output matrix to conduct GRNN training; using the trained GRNN to conduct score prediction on a sparse score matrix; and replacing non-scored data of the sparse score matrix with predicted score values. The method for solving collaborative filtering recommendation data sparsity based on the neural network can conduct fully filling on sparse recommendation data, solve the data height sparsity problem most outstanding in existing collaborative technology, and enable recommendation result to be accurate.

Description

A kind of method solving collaborative filtering recommending Deta sparseness based on neural network

Technical field

The invention belongs to artificial neural network and personalized recommendation technical field, specifically, relate to a kind of method solving collaborative filtering recommending Deta sparseness based on neural network.

Background technology

In advanced information society, every profession and trade all can produce the information data of magnanimity through the accumulation in one period, how from mass data, effectively to extract the research boom that useful information has started personalized recommendation technology.Collaborative filtering receives much concern as main recommended technology, by Successful utilization in various commending system.But along with the continuous expansion of resource category and the increase day by day of user, the data matrix being used for passing judgment on is more and more sparse, has had a strong impact on recommendation quality.

Neural network is a kind of imitation animal nerve network behavior feature, carries out the algorithm mathematics model of distributed parallel information processing.The processing unit of neural network generally can be divided three classes: input block, Hidden unit, output unit.Input block realizes network and extraneous connection, and hidden layer realizes the nonlinear transformation of the input space to concealed space, and output unit realizes final network and exports.Conventional neural network has counterpropagation network, self-organizing network, Recursive Networks and Radial Basis Function Network etc.

Generalized regression nerve networks GRNN(Generalized Regression Neural Network) compared with other neural networks, training process is more simple, only training sample need be determined, connection weights between corresponding network structure and each neuron can be determined automatically, and network training process the most important thing is the process determining smoothing factor.Generalized regression nerve networks has higher approximation capability, pace of learning, robustness, fault-tolerance and non-linear mapping capability, is widely used in fields such as decision control system, structure analysis, education industry, signal analysis.

Summary of the invention

The object of the invention is to overcome the deficiencies in the prior art, a kind of method solving collaborative filtering recommending Deta sparseness based on neural network is provided, adopt generalized regression nerve networks GRNN, score in predicting is carried out by training network model, sparse data are filled completely, improves the openness problem of data height of collaborative filtering.

For achieving the above object, the present invention is based on the method that neural network solves collaborative filtering recommending Deta sparseness, it is characterized in that, comprise the following steps:

Step 1: for representing the sparse rating matrix A that M user marks to N number of project, calculate the degree of rarefication of degree of rarefication that each user marks to all items and all user's scorings of each project, wherein, in sparse rating matrix A, the score value of certain user certain project NE is unified replaces with 0;

User's degree of rarefication threshold value and project degree of rarefication threshold value are set, when the degree of rarefication of certain user is less than user's degree of rarefication threshold value, then delete this user; When the degree of rarefication of certain project is less than project degree of rarefication threshold value, then by this project of deletion, the number of users obtained is designated as m, the number of entry is designated as n, according to m user U _i, 1≤i≤m is to n project P _j, the scoring of 1≤j≤n builds iotave evaluation matrix T:

T = [\begin{matrix} t_{11} & t_{12} & \cdot \cdot \cdot & t_{1 n} \\ t_{21} & t_{22} & \cdot \cdot \cdot & t_{2 n} \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \cdot \cdot & \cdot \\ \cdot & \cdot & \cdot \\ t_{m 1} & t_{m 2} & \cdot \cdot \cdot & t_{mn} \end{matrix}]

Wherein t _ij, 1≤i≤m, 1≤j≤n represents user U _ito project P _jscoring;

Step 2: the actual conditions according to iotave evaluation matrix T select f eigenwert, and calculate f user characteristics value and f item feature value, wherein each user characteristics value is according to the score calculation of user to all items, each item feature value according to all users to this project score calculation; F the eigenwert of each user forms the user characteristics vector of row, and f eigenwert of each project forms the item feature vector of row;

Construct original input matrix I, each user characteristics vector carries out being combined as row with n item feature vector successively, amounts to m*n row, and forms original input matrix I:

Wherein, u _ik, 1≤i≤m, 1≤k≤f represents user U _ia corresponding kth eigenwert, p _jk, 1≤j≤n, 1≤k≤f represents project P _ja corresponding kth eigenwert;

With original input matrix I be input matrix, iotave evaluation matrix T for output matrix training GRNN network, now smoothing factor spread value=1 of GRNN network, the GRNN network of having been trained;

Step 3: to d, 1≤d≤2f input variable in original input matrix I, namely the capable data of d increase or reduce 10% on original value basis, and other input variables are constant, obtains two new input matrix I_increase_d, I_decrease_d; Each input variable all carries out same process, obtains 4f input matrix altogether;

The input matrix of the GRNN network of having been trained in step 2 by this 4f input matrix carries out simulation and prediction, and obtain 4f simulation and prediction output matrix R_increase_d, R_decrease_d, simulation and prediction output matrix is m*n matrix, r _ij, 1≤i≤m, 1≤j≤n is the user U of simulation and prediction _ito project P _jscoring;

Step 4: for simulation data matrix R_increase_d, R_decrease_d that d the input variable obtained in step 3 is corresponding, calculates Mean Impact Value MIV _d:

{MIV}_{d} = \frac{Σ_{i = 1}^{m} Σ_{j = 1}^{n} (r_{ij_increase_d} - r_{ij_decrease_d})}{m \times n}

Wherein, r _{ij_increase_d}represent that d input variable increases user U in the 10% simulation data matrix R_increase_d obtained _ito project P _jscoring, r _{ij_decrease_d}represent that d input variable reduces user U in the 10% simulation data matrix R_decrease_d obtained _ito project P _jscoring;

Calculate 2f each self-corresponding Mean Impact Value MIV of input variable _d, find out 2f Mean Impact Value MIV _din maximal value max (MIV _d), calculated threshold Q=max (MIV _d) × 10%, selects Mean Impact Value MIV _dbe greater than the input variable of threshold value Q as effective input variable, effective input variable quantity is designated as F; In original input matrix I, retain the row data that effective input variable is corresponding, delete the row data that other input variables are corresponding, generate new input matrix I _w;

Step 5: by the input matrix I generated in step 4 _was the input matrix of GRNN network, iotave evaluation matrix T as the output matrix of GRNN network, the optimizing step-length of setting spread value and Search Range, adopt Kfold cross validation GRNN network, the minimum spread value of Select Error is optimum spread value, and input matrix corresponding to optimum spread value is designated as I _s, output matrix is designated as T _s; Adopt optimum spread value, with input matrix I _sas input matrix, output matrix T _sas output matrix re-training GRNN network;

Step 6: adopt the GRNN network trained in step 5 to carry out score in predicting to sparse rating matrix A, M the user that compute sparse rating matrix A comprises and N number of project characteristic of correspondence vector, the input matrix of GRNN network is that F is capable, often capable expression effective input variable, amount to M*N row, often be classified as the combination of the proper vector of any one user in M user and the proper vector of N number of any one project of project, the prediction rating matrix of all M user to all N number of projects is obtained after score in predicting, the score value represented with special symbol by sparse rating matrix A is to predict that score value replaces accordingly.

Wherein, the eigenwert in step 2 comprises mean value, standard deviation, extreme difference, degree of rarefication, maximal value, minimum value.

Goal of the invention of the present invention is achieved in that

The present invention is based on the method that neural network solves collaborative filtering recommending Deta sparseness, the input variable of network is screened, select, on exporting the larger eigenwert of impact as effective input variable, inapparent for effect eigenwert to be foreclosed, reduces secondary variable to the impact of result precision.Effective input variable is utilized to construct the input matrix of GRNN, iotave evaluation matrix is output matrix, the optimizing step-length of setting spread value and Search Range, the circulation of Kfold cross validation method is adopted to find out the optimum spread value of GRNN, the input matrix of optimum spread value and correspondence and output matrix is utilized to carry out GRNN network training, utilize the GRNN trained to carry out score in predicting to sparse rating matrix, replace sparse rating matrix not carry out the data of marking with prediction score value.

Adopt and the present invention is based on the method that neural network solves collaborative filtering recommending Deta sparseness, sparse recommending data can be filled completely, improve the openness problem of data height that existing collaborative filtering is the most outstanding, make recommendation results more accurate.

Accompanying drawing explanation

Fig. 1 the present invention is based on a kind of embodiment process flow diagram that neural network solves the method for collaborative filtering recommending Deta sparseness.

Embodiment

Below in conjunction with accompanying drawing, the specific embodiment of the present invention is described, so that those skilled in the art understands the present invention better.Requiring particular attention is that, in the following description, when perhaps the detailed description of known function and design can desalinate main contents of the present invention, these are described in and will be left in the basket here.

Embodiment

Fig. 1 the present invention is based on a kind of embodiment process flow diagram that neural network solves the method for collaborative filtering recommending Deta sparseness.As shown in Figure 1, in the present embodiment, the device realizing the method that the present invention is based on neural network solution collaborative filtering recommending Deta sparseness comprises two main functional modules, and be Variable Selection module and score in predicting module respectively, embodiment comprises the following steps:

S101: data acquisition and data prediction.

For representing the sparse rating matrix A that M user marks to N number of project, the score value unification special symbol of certain user certain project NE replaces, calculate degree of rarefication that each user marks to all items and the degree of rarefication that all users mark to each project, project degree of rarefication threshold value and user's degree of rarefication threshold value are set, when the degree of rarefication of certain user or the degree of rarefication of certain project are less than threshold value, then the data of correspondence are deleted, retain other users and project data, the number of users obtained is designated as m, the number of entry is designated as n, according to m user U _i, 1≤i≤m is to n project P _j, the score data of 1≤j≤n builds iotave evaluation matrix T:

T = [\begin{matrix} t_{11} & t_{12} & \cdot \cdot \cdot & t_{1 n} \\ t_{21} & t_{22} & \cdot \cdot \cdot & t_{2 n} \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \cdot \cdot & \cdot \\ \cdot & \cdot & \cdot \\ t_{m 1} & t_{m 2} & \cdot \cdot \cdot & t_{mn} \end{matrix}]

Wherein t _ij, 1≤i≤m, 1≤j≤n represents user U _ito project P _jscoring.

In the present embodiment, Movielens data set is adopted to be described specific embodiment of the invention flow process, score value unification special symbol i.e. 0 replacement of certain user certain project NE in sparse rating matrix A.

Calculate degree of rarefication that each user marks to all items and the degree of rarefication that all users mark to each project, the project degree of rarefication threshold value that the present embodiment is arranged is 0.25, user's degree of rarefication threshold value is 0.7, and screening obtains 60 users to the Evaluations matrix of 39 projects.For ease of describing, in the present embodiment, all matrixes represent all in a tabular form.

The iotave evaluation matrix T that table 1 obtains from Movielens data set for the present embodiment.

	P ₁	P ₂	P ₃	…	P ₃₉
						U ₁	5	4	5	…	4
U ₂	3	2	0	…	0

U ₃	5	4	4	…	4
						U ₄	4	4	0	…	4
U ₅	3	0	0	…	4
						…	…	…	…	…	…
U ₆₀	4	3	5	…	1

Table 1

Iotave evaluation matrix T is sent into Variable Selection module, Variable Selection module processes iotave evaluation matrix T, by training GRNN network, adopt Mean Impact Value MIV(Mean Impact Value) carry out Variable Selection, find input variable network Output rusults being had to considerable influence, comprise following concrete steps:

Step 102: calculate GRNN network input variable.

Select f eigenwert according to the actual conditions of iotave evaluation matrix T, using user characteristics value with item feature value as input variable, 2f input variable altogether, calculates m user and n project characteristic of correspondence is vectorial.

In the present embodiment, choose 6 eigenwerts: mean value, standard deviation, extreme difference, degree of rarefication, maximal value, minimum value, calculate m user and 6 eigenwerts corresponding to n project respectively.For user characteristics value, as certain user U _ito n in n project _iindividual project is marked, then calculate n _ithe mean value of individual project scoring, according to mean value calculation n _ithe standard deviation of individual project scoring, finds out n _imaxima and minima in individual project scoring, extreme difference is the difference of maxima and minima, and degree of rarefication is n _iaccount for the number percent comprising number of entry N in sparse rating matrix A; For item feature value, as certain project P _jthere is m _jindividual user marks to it, then calculate m _jmean value mean value, standard deviation, extreme difference, degree of rarefication, maximal value, the minimum value of individual user's scoring; Thus the proper vector of each user and each project can be obtained.

Table 2 is user characteristics vector matrix S_U.

	u_mean	u_sd	u_range	u_sparsity	u_max	u_min
							U ₁	4.5161	0.7980	3	0.7947	5	2
U ₂	3.9286	0.7986	3	0.7179	5	2
							U ₃	4	0.9535	3	0.8462	5	2
U ₄	4.1944	0.6591	2	0.9231	5	3
							…	…	…	…	…	…	…
U ₆₀	3.7500	1.0564	4	0.7179	5	1

Table 2

U_mean represents user U _ito n _ithe mean value of the scoring of individual project, u_sd represents user U _ito n _ithe standard deviation of the scoring of individual project, u_range represents user U _ito n _ithe extreme difference of the scoring of individual project, u_sparsity represents user U _ito n _ithe degree of rarefication of the scoring of individual project, u_max represents user U _ito n _imaximal value in the scoring of individual project, u_min represents user U _ito n _iminimum value in the scoring of individual project.

Table 3 is item feature vector matrix S _ P.

	p_mean	p_sd	p_range	p_sparsity	p_max	p_min
							P ₁	3.9091	0.9586	3	0.9167	5	2
P ₂	3.8654	0.8555	3	0.8667	5	2
							P ₃	3.9063	1.1280	4	0.5333	5	1
P ₄	4.4746	0.7215	3	0.9833	5	2
							…	…	…	…	…	…	…
P ₃₉	3.2973	1.0871	4	0.6167	5	1

Table 3

P_mean represents m _jindividual user is to project P _jthe mean value of scoring, p_sd represents m _jindividual user is to project P _jthe standard deviation of scoring, p_range represents m _jindividual user is to project P _jthe extreme difference of scoring, p_sparsity represents m _jindividual user is to project P _jthe degree of rarefication of scoring, p_max represents m _jindividual user is to project P _jscoring in maximal value, p_min represents m _jindividual user is to project P _jscoring in minimum value.

S103: utilize raw data to train GRNN network.

Determine the original input matrix of GRNN network and original output matrix, the building method of original input matrix I is: often capable expression input variable of original input matrix I, amount to 2f capable, often be classified as the combination of the proper vector of any one project in the proper vector of any one user in m user and n project, amount to m*n row.

Wherein, u _ik, 1≤i≤m, 1≤k≤f represents user U _ia corresponding kth eigenwert, p _jk, 1≤j≤n, 1≤k≤f represents project P _ja corresponding kth eigenwert.

In the present embodiment, original input matrix I comprises 12 row, indicates 12 input variables, i.e. 6 input variables of characterizing consumer feature and 6 input variables of sign item characteristic.The present embodiment iotave evaluation matrix T comprises 60 users and 39 projects, and therefore original input matrix I has 2340 row.

The original input matrix I that table 4 is formed for the present embodiment.

	1st row	2nd row	3rd row	…	2340th row
						u_mean	4.5161	4.5161	4.5161	…	3.7500
u_sd	0.7980	0.7980	0.7980	…	1.0564
						u_range	3	3	3	…	4
u_sparsity	0.7947	0.7948	0.7948	…	0.7179
						u_max	5	5	5	…	5
u_min	2	2	2	…	1
						p_mean	3.9091	3.8654	3.9063	…	3.2973
p_sd	0.9586	0.8555	1.1280	…	1.0871
						p_range	3	3	4	…	4
p_sparspty	0.9167	0.8667	0.5333	…	0.6167
						p_max	5	5	5	…	5
p_mpn	2	2	1	…	1

Table 4

With original input matrix I be input matrix, iotave evaluation matrix T for output matrix training GRNN network, now the smoothing factor spread value of GRNN network is default value: spread value=1, the GRNN network of having been trained.

S104: increase, reduction input variable emulate.

To d, 1≤d≤2f input variable in original input matrix I, namely the capable data of d increase or reduce 10% on original value basis, and other input variables are constant, obtain two new input matrix I_increase_d, I_decrease_d.Each input variable all carries out same process, obtains 4f input matrix altogether.The input matrix of the GRNN network trained in step S103 by this 4f input matrix carries out simulation and prediction, obtains 4f simulation and prediction output matrix R_increase_d, R_decrease_d.Simulation and prediction output matrix is the same with iotave evaluation matrix T is m*n matrix, r _ij, 1≤i≤m, 1≤j≤n is the user U of simulation and prediction _ito project P _jscoring.In the present embodiment, 12 simulation and prediction output matrixes can be obtained altogether.

S105: the MIV value of computer sim-ulation prediction output matrix.

For simulation and prediction output matrix R_increase_d, R_decrease_d that d the input variable obtained in step S104 is corresponding, calculate the difference of each element and the difference of all elements is averaged, obtaining Mean Impact Value MIV _d, its computing formula is:

{MIV}_{d} = \frac{Σ_{i = 1}^{m} Σ_{j = 1}^{n} (r_{ij_increase_d} - r_{ij_decrease_d})}{m \times n}

Wherein, r _{ij_increase_d}represent that d input variable increases user U in the 10% simulation data matrix obtained _ito project P _jscoring, r _{ij_decrease_d}represent that d input variable reduces user U in the 10% simulation data matrix obtained _ito project P _jscoring.2f Mean Impact Value MIV can be obtained altogether _d.In the present embodiment, 12 Mean Impact Value MIV can be obtained _d.

Table 5 is the Mean Impact Value MIV that in the present embodiment, each input variable is corresponding _d.

u_mean	u_sd	u_range	u_sparsity	u_max	u_min
						0.1158	-0.0031	-0.0328	0.0041	0.0055	0.0232
p_mean	p_sd	p_range	p_sparspty	p_max	p_mpn
						0.1552	-0.0076	-0.0250	0.0263	0	0.0049

Table 5

S106: Variable Selection.

Mean Impact Value MIV _drepresent the weight of d input variable, the larger explanation of weight is larger on the impact of this input variable on Output rusults, and the impact of the less explanation of weight on Output rusults is less, carries out the screening of input variable accordingly.Screening technique is: find out the Mean Impact Value MIV that 2f input variable is corresponding _din maximal value max (MIV _d), calculate 10 of this maximal value for selecting threshold value Q=max (MIV _d) × 10%, selects Mean Impact Value MIV _dbe greater than the input variable of threshold value Q as effective input variable, effective input variable quantity is designated as F.

In the present embodiment, can be obtained by table 5, in 12 Mean Impact Values, be the Mean Impact Value that p_mean is corresponding to the maximum: 0.1552, therefore threshold value Q is 0.01552, selects Mean Impact Value MIV _dbe greater than the input variable of threshold value Q, therefore effectively input variable there are 6, is respectively u_mean, u_range, u_min, p_mean, p_range, p_sparsity.

After Variable Selection module filters out effective input variable, enter score in predicting module and utilize GRNN network to carry out Collaborative Filtering, comprise following concrete steps:

S107: structure GRNN network input matrix.

In original input matrix I, the row data that effective input variable that reservation step S106 filters out is corresponding, delete the row data that other input variables are corresponding, generate new input matrix I _w.

The new input matrix I of table 6 for generating in the present embodiment _w.

	1st row	2nd row	3rd row	…	2340th row
						u_mean	4.5161	4.5161	4.5161	…	3.7500
u_range	3	3	3	…	4
						u_min	2	2	2	…	1
i_mean	3.9091	3.8654	3.9063	…	3.2973
						i_range	3	3	4	…	4
i_sparsity	0.9167	0.8667	0.5333	…	0.6167

Table 6

Step 108: training network, finds optimum spread value.

By the input matrix I generated in step 107 _was the input matrix of GRNN network, iotave evaluation matrix T as the output matrix of GRNN network, the optimizing step-length of setting spread value and Search Range, adopt Kfold cross validation GRNN network, the minimum spread value of Select Error is optimum spread value, and the input matrix corresponding to optimum spread value is designated as I _s, output matrix is designated as T _s.

In the present embodiment, by input matrix I _wbe divided into input training set and input test collection, front 80% row are as input training set I _w_ train, remaining 20% row are as input test collection I _w_ test; Done to be divided into by 80% row before iotave evaluation matrix T and export training set T_train, remaining 20% row is as output test set T_test.With I _w_ train is input matrix, T_train is output matrix, adopt Kfold cross validation GRNN network, find optimum spread value: the Search Range of setting spread is 0 ~ 2, the optimizing step-length that each circulation spread increases is 0.1, calculate the square error MSE of each element in the output matrix and T_train at every turn circulated, find minimum MSE after circulation terminates, the spread value of its correspondence is optimum spread value.In the present embodiment, the optimum spread value obtained is 0.5, and corresponding input matrix is I _s, output matrix is T _s.

S109: utilize optimum spread value training network.

Adopt the optimum spread value obtained in step S108, with the input matrix I obtained in step S108 _sas input matrix, output matrix T _sas output matrix re-training GRNN network.

S110: score in predicting.

The GRNN network trained in step S109 is utilized to carry out score in predicting to needing the sparse rating matrix A filled, M the user that compute sparse rating matrix A comprises and N number of project characteristic of correspondence vector, the input matrix of GRNN score in predicting is that F is capable, often capable expression effective input variable, amount to M*N row, often be classified as the combination of the proper vector of any one user in M user and the proper vector of N number of any one project of project, the prediction rating matrix of all M user to all N number of projects is obtained after score in predicting, the score value represented with special symbol by sparse rating matrix A is to predict that score value replaces accordingly, can realize the score in predicting of sparse rating matrix A and fill completely.

In the present embodiment, utilize the GRNN network trained to input test collection I _w_ test carries out score in predicting, obtains score in predicting matrix.Table 7 is input test collection I _wthe score in predicting matrix of _ test

	P ₁	P ₂	P ₃	P ₄	P ₅	…	P ₃₉
								U ₄₉	3	3	2	3	3	…	2
U ₅₀	4	4	2	5	4	…	3
								U ₅₁	4	4	3	4	4	…	3
U ₅₂	3	3	2	3	4	…	2
								U ₅₃	4	4	2	5	4	…	3
U ₅₄	4	4	2	4	4	…	3
								U ₅₅	4	3	2	4	4	…	3
U ₅₆	4	4	2	5	4	…	3
								U ₅₇	3	3	2	4	4	…	2
U ₅₈	3	3	2	3	4	…	2
								U ₅₉	3	3	2	4	4	…	2
U ₆₀	3	3	2	3	4	…	2

Table 7

Table 8 is real rating matrix T_test.

	P ₁	P ₂	P ₃	P ₄	P ₅	…	P ₃₉
								U ₄₉	3	4	5	4	5	…	0
U ₅₀	5	5	5	5	5	…	0
								U ₅₁	4	3	0	5	5	…	4
U ₅₂	4	4	3	3	1	…	3
								U ₅₃	5	4	0	5	4	…	5
U ₅₄	3	5	4	4	5	…	5
								U ₅₅	3	3	4	4	5	…	0
U ₅₆	5	4	0	5	4	…	0
								U ₅₇	4	4	4	5	0	…	0
U ₅₈	2	5	5	4	5	…	0
								U ₅₉	4	4	5	5	5	…	2
U ₆₀	4	3	5	3	0	…	1

Table 8

In the present embodiment, adopt average absolute percentage error MAPE to weigh the accuracy of scoring, computing formula is as follows:

Wherein num is that the number of the value not being 0 and user comment undue item number, observed in true rating matrix T_test _xfor the actual score value in T_test, predicted _xfor input test collection I _w_ test is by the corresponding score value in the score in predicting matrix of neural network forecast.

As calculated, the MAPE value obtained in the present embodiment is 24.86%, visible employing the present invention is based on method that neural network solves collaborative filtering recommending Deta sparseness, and to carry out the precision of score in predicting higher, effectively can fill, solve Sparse Problem in collaborative filtering to sparse matrix.

Although be described the illustrative embodiment of the present invention above; so that those skilled in the art understand the present invention; but should be clear; the invention is not restricted to the scope of embodiment; to those skilled in the art; as long as various change to limit and in the spirit and scope of the present invention determined, these changes are apparent, and all innovation and creation utilizing the present invention to conceive are all at the row of protection in appended claim.

Claims

1. solve a method for collaborative filtering recommending Deta sparseness based on neural network, it is characterized in that, comprise the following steps:

T = [\begin{matrix} t_{11} & t_{12} & . . . & t_{1 n} \\ t_{21} & t_{22} & . . . & t_{2 n} \\ . & . & . \\ . & . & . . . & . \\ . & . & . \\ t_{m 1} & t_{m 2} & . . . & t_{mn} \end{matrix}];

Wherein t _ij, 1≤i≤m, 1≤j≤n represents user U _ito project P _jscoring;

Construct original input matrix I, a m user characteristics vector to carry out being combined as row with n item feature vector successively, amount to m*n row and form original input matrix I:

Step 3: often capable expression input variable of original input matrix I, amounts to 2f capable; To d, 1≤d≤2f input variable in original input matrix I, namely the capable data of d increase or reduce 10% on original value basis, and other input variables are constant, obtain two new input matrix I_increase_d, I_decrease_d; Each input variable all carries out same process, obtains 4f input matrix altogether;

The input matrix of the GRNN network of having been trained in step 2 by this 4f input matrix carries out simulation and prediction, obtain 4f simulation and prediction output matrix R_increase_d, R_decrease_d, simulation and prediction output matrix is m*n matrix, the element r in simulation and prediction output matrix _ij, 1≤i≤m, 1≤j≤n is the user U that simulation and prediction obtains _ito project P _jscoring;

{MIV}_{d} = \frac{Σ_{i = 1}^{m} Σ_{j = 1}^{n} (r_{ij_increase_d} - r_{ij_decrease_d})}{m \times n}

2. the method solving collaborative filtering recommending Deta sparseness based on neural network according to claim 1, it is characterized in that, f eigenwert in described step 2 is 6 eigenwerts, is respectively mean value, standard deviation, extreme difference, degree of rarefication, maximal value, minimum value.