CN113537619A - Power grid user evaluation sparse matrix scoring prediction method - Google Patents

Power grid user evaluation sparse matrix scoring prediction method Download PDF

Info

Publication number
CN113537619A
CN113537619A CN202110868792.4A CN202110868792A CN113537619A CN 113537619 A CN113537619 A CN 113537619A CN 202110868792 A CN202110868792 A CN 202110868792A CN 113537619 A CN113537619 A CN 113537619A
Authority
CN
China
Prior art keywords
matrix
user
scoring
similarity
score
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110868792.4A
Other languages
Chinese (zh)
Inventor
杨强
张云菊
郭明
史虎军
张玉罗
司胜文
杜秀举
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guizhou Power Grid Co Ltd
Original Assignee
Guizhou Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guizhou Power Grid Co Ltd filed Critical Guizhou Power Grid Co Ltd
Priority to CN202110868792.4A priority Critical patent/CN113537619A/en
Publication of CN113537619A publication Critical patent/CN113537619A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • Human Resources & Organizations (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Finance (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Game Theory and Decision Science (AREA)
  • Artificial Intelligence (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a power grid user evaluation sparse matrix scoring prediction method, which comprises the following steps: calculating semantic similarity between user evaluations by constructing a hierarchical structure tree of an ontology concept, and performing partial grading, prediction and filling on the user evaluations of which the similarity is greater than a similarity threshold in a sparse matrix; based on the predicted scoring matrix, performing dimensionality reduction and decomposition based on a matrix decomposition theory, and further performing scoring prediction on user evaluation less than or equal to a similarity threshold value, so that the prediction filling of scoring missing values in the sparse matrix is realized; the overfitting phenomenon generated by the matrix decomposition algorithm under the condition that the matrix is extremely sparse is improved, and the quality of the collaborative filtering recommendation algorithm is improved.

Description

Power grid user evaluation sparse matrix scoring prediction method
Technical Field
The invention belongs to the technical field of software, and particularly relates to a power grid user evaluation sparse matrix scoring prediction method.
Background
The intelligent question-answering system orderly and scientifically arranges the accumulated unordered corpus information and establishes a knowledge-based classification model; the classification models can guide the newly added corpus consultation and service information, save human resources and improve the automation of information processing. The intelligent question-answering system searches corresponding answers from a knowledge base or the Internet according to user questions, and then directly returns the answers to the user, wherein a collaborative filtering recommendation mechanism is a personalized recommendation technology which has the most research and the best effect at present, and generates recommendations by collecting evaluation information of other users which are the same as or similar to the interests and hobbies of the user according to historical selection information and similarity relations of the user, so that important progress is made in both theoretical research and engineering practice.
Under a big data environment, historical data generated by power grid user behaviors are rapidly increased, compared with the total number of users and information data of user evaluation, data generated by single user-user evaluation is very little, and new users and user evaluation are continuously added into a system to continuously generate new association, so that a data set has high sparsity.
Research shows that when user scoring data is sparse, the performance of a recommendation system is sharply reduced. Aiming at the problem of sparsity of power grid user behavior scoring, the problem of data sparsity is solved by iteratively learning user evaluation hidden variable distribution compression user scoring matrix dimension, and the user evaluation space dimension is reduced through singular value decomposition, but the dimension reduction can cause information loss and is difficult to ensure the recommendation effect. The user-item scoring matrix is filled based on the concept semantic similarity, good effect is achieved, however, the algorithm can only conduct scoring prediction on user evaluation with high similarity, and cannot conduct predictive scoring on user evaluation with low similarity in the user-item scoring matrix.
The semantic similarity is considered based on the semantic overlapping degree, so that the similar part between concepts can be embodied, but only the part with the overlapped semantics is considered, and the difference part is not considered. After the content similarity among the user evaluations is obtained, a plurality of the user evaluations with higher similarity are selected for score prediction, and the prediction scores are used for filling empty items in a user-user evaluation matrix, so that the sparsity of the empty items is reduced. Because there is a large difference in attribute description between different categories of user evaluations, the semantic-based method cannot calculate the similarity between the cross-category user evaluations, and thus cannot perform cross-category rating prediction. In addition, the similarity calculation based on the semantics needs to extract the attribute characteristics evaluated by the user, design the domain knowledge, and has a narrow application range.
SVD (Singular Value Decomposition) is a matrix Decomposition algorithm, and can effectively extract key features and deeply reveal the internal structure of a matrix. The singular value decomposition graph is shown in fig. 1. Sarwar et al introduced SVD into a collaborative filtering algorithm, decomposed the user evaluation scores into user and user evaluation feature vector matrices using a matrix decomposition method, and extracted some essential features using singular values of the score matrices using the potential relationships between users and user evaluations. The SVD algorithm improves the recommendation quality of the collaborative filtering recommendation system on the sparse scoring matrix, the difference between the eigenvalue of the matrix after the scoring matrix is filled and the eigenvalue before the scoring matrix is filled is small, and due to the fact that data in the recommendation system is sparse, the SVD algorithm is often subjected to an overfitting phenomenon in scoring prediction.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the utility model provides a power grid user evaluation sparse matrix scoring prediction method, which aims to solve the technical problems of overfitting phenomenon generated by a matrix decomposition algorithm caused by the sparsity of a power grid user scoring matrix, and reduction of the quality of a collaborative filtering recommendation algorithm.
The technical scheme of the invention is as follows:
a power grid user evaluation sparse matrix scoring prediction method comprises the following steps: calculating semantic similarity between user evaluations by constructing a hierarchical structure tree of an ontology concept, and performing partial grading, prediction and filling on the user evaluations of which the similarity is greater than a similarity threshold in a sparse matrix; and on the basis of the predicted scoring matrix, performing dimensionality reduction and decomposition based on a matrix decomposition theory, and further performing scoring prediction on the user evaluation less than or equal to the similarity threshold value, so that the prediction filling of the scoring missing value in the sparse matrix is realized.
A power grid user evaluation sparse matrix scoring prediction method specifically comprises the following steps:
step 1, classifying user evaluation into a scored concept instance set V according to user scoresmAnd a set W of scored concept instances to be predictedm
Step 2, traversing the ontology hierarchical tree to obtain the depth (I) of the concept instancei),Ii∈Vm,depth(Ij),Ij∈WmAnd its smallest common ancestor node lso (I)i,Ij) Calculating the user evaluation similarity;
step 3, obtaining VmSubset V with user evaluation similarity greater than similarity threshold epsilonm', evaluation of the user to be predicted Ij∈WmCarrying out score prediction;
step 4, VmWithout any element and IjThe similarity is larger than a threshold value epsilon, access WmSkipping to the next element in the step 2;
and 5, taking the score matrix predicted by the semantic similarity method as an input matrix, reducing the dimension of the score matrix according to a matrix decomposition method, decomposing the score matrix into a low-order approximate matrix which is the best approximate to the original matrix, calculating a loss function, performing iterative calculation to obtain element values of the two matrixes after the score matrix is decomposed, and finally obtaining the evaluation score value of the user to be predicted in the score matrix.
The method for calculating the user evaluation similarity in the step 2 comprises the following steps: the shorter the distance between two concept instances, the higher the similarity between the two concepts, and vice versa, the similarity formula is:
Figure BDA0003188298410000041
in the formula: depth represents the shortest path from the node to the root node, lso represents the minimum common ancestor node of the two instances, len represents the shortest path between the two nodes; as can be seen from equation (1), the similarity between concepts increases with increasing depth of the minimum common ancestor node, and, when I isi=IjWhen, sim (I)i,Ij)=1。
Step 3 obtaining VmSubset V with user evaluation similarity greater than similarity threshold epsilonm', evaluation of the user to be predicted Ij∈WmThe formula for making the score prediction is:
Figure BDA0003188298410000042
the specific method for predicting the score is as follows:
let the user rating set be I ═ I (I)1,...,Ii,Ij,...,In) ε is a threshold constant, if IiAnd IjIf the semantic similarity of (1) is greater than the threshold value, then IiAnd IjAre similar between them; otherwise IiAnd IjAre dissimilar;
let U ═ be (U) set as user set1,...,Um,Un,...,Uk) User UmThe evaluation set of the scored users is VmThe set of unscored user ratings is Wm,Ii、IjAre respectively a set Vm、WmAny one of the elements of (1), RmiIs a user UmScored user ratings IiEpsilon is a threshold constant;
user UmRating I to the userjScore prediction of (3) RmjIs as follows.
Figure BDA0003188298410000043
And (3) proving that: suppose thatScored set VmThe method comprises t user evaluations and user evaluation I to be evaluatedjTwo with similarity greater than threshold epsilon are respectively Ip,IqAccording to the user evaluation Ip,IqThe predictive scoring formula of (1):
Figure BDA0003188298410000051
comprehensive consideration of similar user evaluation Ip,IqUser evaluation with similarity IjThe similarity weight of (c) is given by:
Figure BDA0003188298410000052
according to a known score Rmp,RmqThe similarity weight can be given by the formula:
Figure BDA0003188298410000053
further extend to UmScored set VmComprises more than one user evaluation I to be evaluatedjWhen similar users rate, i.e.
Figure BDA0003188298410000054
sim(Ii,Ij) If the value is more than epsilon, the user U has a score for evaluating the user to be predicted according to the semantic similarity prediction methodmRating I to the userjScore prediction of (3) RmjAnd (4) calculating a formula.
Step 5, taking the score matrix predicted by the semantic similarity method as an input matrix, reducing the dimension of the score matrix according to a matrix decomposition method, decomposing the score matrix into a low-order approximate matrix which is the best approximate to the original matrix, and calculating a loss function by the method comprising the following steps:
setting a scoring matrix predicted by a semantic similarity method as a matrix R of m multiplied by n, and for the sparse matrix R, applying a matrix decomposition method to predict a matrix missing value, wherein the matrix R isThe matrix can be decomposed into a U matrix with dimensions of m multiplied by k and a V matrix with dimensions of k multiplied by n; symbol
Figure BDA0003188298410000055
For the approximate scoring matrix after prediction of matrix R:
Figure BDA0003188298410000061
the matrix U is the relation between m users and k subjects, the matrix V is the relation between k subjects and n user evaluations, the subject k is a parameter based on specific user evaluations, the ith row of the scoring matrix is approximated, and the element value of the jth column is as follows:
Figure BDA0003188298410000062
assuming the loss function of the approximate scoring matrix as the sum of the squares of the actual scores and the approximate scores, the loss function of the score prediction based on matrix decomposition is expressed as:
Figure BDA0003188298410000063
step 5, the method for obtaining each element value of the two matrixes after the decomposition of the scoring matrix by iterative computation and finally obtaining the user evaluation scoring value to be predicted in the scoring matrix comprises the following steps:
and respectively solving partial derivatives of the loss functions to obtain:
Figure BDA0003188298410000064
Figure BDA0003188298410000065
advancing along the fastest descending direction based on a gradient descent optimization algorithm, wherein the symbol alpha is a learning rate:
Figure BDA0003188298410000066
to prevent overfitting of the scoring matrix, a regularization term beta (| | u) is addedi||2+||vk||2) Where β is a regularization parameter, yielding:
ui,k=ui,k+α(2Ei,jvk,j-βui,k)
vk,j=vk,j+α(2Ei,jui,k-βvk,j) (12)
after the matrix U and V is solved, the prediction scoring formula of the user i on the item j is as follows:
u(i,1)*v(1,j)+u(i,2)*v(2,j)+…+u(i,k)*v(k,j) (13)。
the method also comprises the step of verifying the feasibility and the effectiveness of the power grid user evaluation sparse matrix scoring prediction method, wherein the verification method specifically comprises the following steps: the algorithm is realized by C + + language under QT7.4.7 programming environment; data is from a dataset collected by the university of minnesota state computer science group research group for collaborative filtering algorithms; the sparsity of the data set is 100000/1682 multiplied by 943 ≈ 93.7%, firstly, the whole data set is subjected to randomized shuffling operation, then, experimental data are averagely divided into 5 mutually disjoint sub data sets, and the data proportion of the training set to the testing set is 4: 1; the scale values of the distribution of score 1 to score 5 for the 5 data sets are shown in table 1.
TABLE 1 score distribution ratio
Figure BDA0003188298410000071
The difference of the data distribution ratio of the data set 1 with the score of 5 and the data set 3 with the score of 5 is 1.87%, the difference is the largest in the whole data set, and the differences of the scores of the rest data sets are smaller, so that the data set 1 is not used as a test set, and 1 of the data sets from 2 to 5 is selected as the test set to be tested;
in the experiment, the times of classifying and constructing the ontology layer according to data are recorded as hierarchy tree, the semantic similarity threshold is epsilon, the characteristic number or dimension of matrix decomposition is F, the learning rate is alpha, the regularization parameter is beta, and the related parameter settings are shown in Table 2:
table 2 experimental parameter settings
Figure BDA0003188298410000081
In order to verify the performance of the algorithm, a score prediction algorithm based on semantic similarity, a score prediction algorithm based on sparse matrix singular value decomposition and a prediction algorithm based on random number filling missing values and then matrix decomposition are respectively operated on a data set and are subjected to comparative analysis, and an operation result is obtained by adjusting a similarity threshold value and iteration times and is subjected to statistical analysis;
adopting the average absolute deviation MAE as a measurement standard; the MAE measures the accuracy of the prediction by calculating the deviation between the predicted user score and the actual user score; assume that the scores of the N predicted projects are represented by a vector of { p } in the algorithm1,p2,…,pnThe corresponding actual user evaluation set is r1,r2,…rnAnd then the MAE calculation formula is as follows:
Figure BDA0003188298410000082
the resulting similarity threshold was set to 0.75.
For effectiveness, a prediction filling method is carried out by directly carrying out SVD (singular value decomposition) on the basis of a sparse matrix, and random numbers are used, wherein the range is 1-5; comparing the algorithm of pre-filling the sparse matrix and then performing matrix decomposition with the result of fluctuation of the algorithm along with the preserved dimension change after matrix decomposition to obtain the following result: under the condition of considering the preservation of the attribute of the original scoring data set, the similarity threshold value in the scoring prediction process is adjusted to partially pre-fill the items with higher similarity, richer source data information is provided for the original sparse matrix of matrix decomposition, and the MAE value of the algorithm is smaller than that of the SVD decomposition algorithm under the condition that the similarity threshold value is set to be 0.75.
The invention has the beneficial effects that:
according to the method, partial scoring prediction filling is carried out on the user evaluation with higher similarity in the sparse matrix through the semantic similarity of the ontology concept, then the missing value is further predicted for the non-scored item with lower similarity by utilizing a matrix decomposition algorithm on the basis of the predicted scoring matrix, the sparse matrix is subjected to secondary prediction filling, and finally the complete user-scoring matrix is obtained, so that the overfitting phenomenon generated by the matrix decomposition algorithm under the condition that the matrix is extremely sparse is improved, and the quality of the collaborative filtering recommendation algorithm is improved.
Drawings
FIG. 1 is a schematic matrix decomposition of the present invention;
FIG. 2 is a schematic diagram illustrating the effect of a semantic similarity threshold on an MAE value according to the present invention;
FIG. 3 is a graph comparing the change of MAE values with the preserved dimension after decomposition.
Detailed Description
The shorter the distance between two concept instances, the higher the similarity between the two concepts and vice versa. The semantic distance similarity of the concept is embodied on the basis of the overlapping path of the concept words in the ontology hierarchical tree, and the similarity algorithm model is shown as a formula (1).
Figure BDA0003188298410000101
Where depth represents the shortest path from a node to the root node, lso represents the least common ancestor node of the two instances, and len represents the shortest path between the two nodes. As can be seen from equation (1), the similarity between concepts increases with increasing depth of the minimum common ancestor node, and, when I isi=IjWhen, sim (I)i,Ij)=1。
Let the user rating set be I ═ I (I)1,...,Ii,Ij,...,In) Epsilon is oneA threshold constant, if IiAnd IjIf the semantic similarity of (1) is greater than the threshold value, then IiAnd IjAre similar between them; otherwise IiAnd IjAre dissimilar.
The user set is U ═ U1,...,Um,Un,...,Uk) User UmThe evaluation set of the scored users is VmThe set of unscored user ratings is Wm,Ii、IjAre respectively a set Vm、WmAny one of the elements of (1), RmiIs a user UmScored user ratings Iiε is a threshold constant.
User UmRating I to the userjScore prediction of (3) RmjIs as follows.
Figure BDA0003188298410000102
And (3) proving that: hypothesis scored set VmThe method comprises t user evaluations and user evaluation I to be evaluatedjTwo with similarity greater than threshold epsilon are respectively Ip,IqAccording to the user evaluation Ip,IqThe predictive scoring formula of (1):
Figure BDA0003188298410000103
comprehensive consideration of similar user evaluation Ip,IqUser evaluation with similarity IjThe similarity weight of (c) is given by:
Figure BDA0003188298410000111
according to a known score Rmp,RmqThe similarity weight can be given by the formula:
Figure BDA0003188298410000112
further extend to UmScored set VmComprises a plurality of user evaluations I to be evaluatedjWhen similar users rate, i.e.
Figure BDA0003188298410000113
sim(Ii,Ij) If the value is more than epsilon, the user U has a score for evaluating the user to be predicted according to the semantic similarity prediction methodmRating I to the userjScore prediction of (3) Rmj
Figure BDA0003188298410000114
The scoring prediction based on semantic similarity effectively fills a user scoring sparse matrix, but only can perform scoring prediction on similar user evaluations, and the user evaluations with lower similarity cannot perform scoring prediction.
And setting a user evaluation scoring matrix as a matrix R of m multiplied by n, and for the sparse matrix R, applying a matrix decomposition method to predict matrix missing values, wherein the matrix R can be decomposed into a U matrix of m multiplied by k dimensions and a V matrix of k multiplied by n dimensions. Symbol
Figure BDA0003188298410000115
Is an approximate scoring matrix after the matrix R is predicted.
Figure BDA0003188298410000116
The matrix U is the relation between m users and k subjects, the matrix V is the relation between k subjects and n user evaluations, and the subject k is a parameter based on specific user evaluations and can be adjusted according to requirements. The element values of the ith row and the jth column of the approximate scoring matrix are:
Figure BDA0003188298410000121
assuming the loss function of the approximate scoring matrix as the sum of the squares of the actual scores and the approximate scores, the loss function of the score prediction based on matrix decomposition is expressed as:
Figure BDA0003188298410000122
and respectively solving partial derivatives of the loss functions to obtain:
Figure BDA0003188298410000123
advancing along the fastest descending direction based on a gradient descent optimization algorithm, wherein the symbol alpha is a learning rate:
Figure BDA0003188298410000124
to prevent overfitting of the scoring matrix, a regularization term beta (| | u) is addedi||2+||vk||2) Where β is a regularization parameter, yielding:
Figure BDA0003188298410000125
after the matrix U and V is solved, the prediction scoring formula of the user i on the item j is as follows:
Figure BDA0003188298410000126
based on the above, the power grid user evaluation sparse matrix scoring prediction algorithm based on the ontology concept hierarchical structure tree can be expressed as follows: the method comprises the steps of firstly, carrying out partial scoring prediction filling on user evaluation with high similarity in a sparse matrix through semantic similarity of an ontology concept, then further predicting missing values of un-scored items with low similarity by utilizing a matrix decomposition algorithm based on the predicted scoring matrix, carrying out secondary prediction filling on the sparse matrix, and finally obtaining a complete user-scoring matrix so as to improve an overfitting phenomenon generated by the matrix decomposition algorithm under the condition that the matrix is extremely sparse and improve the quality of a collaborative filtering recommendation algorithm. The detailed steps of the algorithm are as follows:
inputting: a user-user rating score sparse matrix.
And (3) outputting: user-user evaluation prediction scoring matrix.
Step 1: classifying user ratings into a set of scored concept instances V according to user scoresmAnd a set W of scored concept instances to be predictedm
Step 2: traversing the ontology hierarchical tree to obtain the depth (I) of the concept instancei),Ii∈Vm,depth(Ij),Ij∈WmAnd its smallest common ancestor node lso (I)i,Ij) And calculating the user evaluation similarity according to the formula (1).
Step 3: obtaining VmSubset V with user evaluation similarity greater than similarity threshold epsilonm', the user to be predicted is evaluated I according to the formula (6)j∈WmAnd (6) performing score prediction.
Step 4:VmWithout any element and IjThe similarity is larger than a threshold value epsilon, access WmAnd jumping to Step 2.
Step 5: taking the score matrix predicted by the semantic similarity method as an input matrix, reducing the dimension of the score matrix according to a matrix decomposition method, decomposing the score matrix into a low-order approximate matrix which is the best approximate to the original matrix,
and (3) calculating a loss function according to a formula (9), obtaining each element value of the two matrixes after the decomposition of the scoring matrix through iterative calculation according to a formula (12), and substituting the element values into a formula (13) to calculate the evaluation scoring value of the user to be predicted in the scoring matrix.
In order to verify the feasibility and effectiveness of the algorithm, the algorithm is implemented in C + + language under the QT7.4.7 programming environment. The data used in the experiments were from a dataset collected by the university of minnesota computer science group for collaborative filtering algorithms (http:// group. org/datasets/movielens/100k /), which was found to be 100000/1682 x 943 ≈ 93.7%, and thus very sparse. In the experiment, firstly, the whole data set is subjected to randomized shuffling operation, then the experimental data is averagely divided into 5 mutually disjoint sub data sets, and the data ratio of the training set to the testing set is 4: 1.
The scale values of the distribution of score 1 to score 5 for the 5 data sets are shown in table 1.
TABLE 1 score distribution ratio
Figure BDA0003188298410000141
As can be seen from the score distribution ratios in table 1, the difference between the data having a score of 5 in the data set 1 and the data having a score of 5 in the data set 3 is 1.87%, and the difference is the largest in the entire data set, and the score distributions of the remaining data sets have smaller differences, so that the data set 1 is not used as a test set, and 1 of the data sets from the data set 2 to the data set 5 is optionally used as a test set for testing.
In the experiment, the number of times of classifying and constructing the ontology layer according to data is recorded as hierarchy tree, the semantic similarity threshold is epsilon, the characteristic number (dimension) of matrix decomposition is F, the learning rate is alpha, the regularization parameter is beta, and the related parameters are set as shown in Table 2.
Table 2 experimental parameter settings
Figure BDA0003188298410000151
In order to verify the performance of the algorithm, a scoring prediction algorithm based on semantic similarity, a scoring prediction algorithm based on sparse matrix singular value decomposition and a prediction algorithm based on random number filling and matrix decomposition are respectively operated on a data set and are compared and analyzed. And obtaining an operation result by adjusting parameters such as a similarity threshold value, iteration times and the like, and performing statistical analysis.
The mean Absolute deviation mae (means Absolute error) was used as a metric. The MAE measures the accuracy of the prediction by calculating the deviation between the predicted user score and the actual user score. Assume that the scores of the N predicted projects are represented by a vector of { p } in the algorithm1,p2,…,pnThe corresponding actual user evaluation set is r1,r2,…rnAnd then the MAE calculation formula is as follows. The variation curves of the different score prediction algorithms MAE values are shown in fig. 3.
Figure BDA0003188298410000152
The experiment is that firstly, the semantic similarity threshold is adjusted to compare the semantic similarity algorithm with the MAE value change trend of the proposed algorithm, and the operation result is shown in figure 1. The experimental result shows that the change trend of the semantic similarity algorithm provided by the invention and the project semantic similarity algorithm based on the structure tree of the building body along with the semantic similarity threshold is similar, and the MAE value is not reduced along with the increase of the semantic similarity threshold under the condition that the semantic similarity threshold is larger than 0.8, because if the semantic similarity threshold is set to be too large, the algorithm only scores and predicts the projects with extremely high similarity, the prediction result is relatively more accurate, but the projects with low similarity still cannot score and predict, the projects scored and predicted by the semantic similarity are relatively reduced, the data set is still sparse after prediction, and the purpose of the prediction algorithm cannot be achieved. Too small a semantic similarity threshold can result in inaccurate scoring predictions that corrupt the original properties of the data set. In order to achieve the purpose of not damaging the internal structure of a data set, but also partially predicting items with larger similarity in a polarity sparse matrix to provide more basic information for matrix decomposition, through experimental data analysis, the semantic similarity threshold of the algorithm is set to be 0.75 appropriately.
In order to verify the effectiveness of the algorithm of the invention, a prediction filling method based on the direct SVD decomposition of the sparse matrix, an algorithm which uses random numbers (ranging from 1 to 5) to pre-fill the sparse matrix and then carries out matrix decomposition are compared with the fluctuation result of the algorithm along with the preserved dimension change after matrix decomposition, and the line graph of the change of the MAE value is shown in FIG. 2.
As can be seen from the comparison of the algorithm in fig. 2, the MAE value after the conventional pre-filling of the score matrix with random numbers or score means for matrix decomposition is likely to be larger than that after the original matrix is directly subjected to SVD decomposition, because the filling of the missing score may change the essential properties of the original data. Under the condition of considering the attribute of the original score data set to be reserved, the algorithm of the invention partially pre-fills the items with higher similarity by adjusting the similarity threshold value in the score prediction process, and provides richer source data information for the original sparse matrix of matrix decomposition. In fig. 2, the MAE value of the algorithm of the present invention is smaller than that of the SVD decomposition algorithm when the similarity threshold is set to 0.75.
And (4) conclusion:
in order to solve the problem of sparsity of a scoring matrix in the intelligent question-answering recommendation system for the power grid, the method introduces the ontology semantic similarity to perform partial scoring prediction on items with higher similarity in the sparse scoring matrix, performs preprocessing on the sparse matrix for further predicting scoring missing values through matrix decomposition, and effectively solves the sparsity problem of the scoring matrix through twice scoring prediction. The semantic similarity algorithm of the ontology concept can not destroy the essential characteristics of original data by adjusting the semantic similarity threshold, and can provide more complete data support for the matrix decomposition algorithm. The experimental results show that: the sparse scoring matrix is completely filled, and the matrix MAE value is reduced to a certain extent.

Claims (8)

1. A power grid user evaluation sparse matrix scoring prediction method is characterized by comprising the following steps: it includes: calculating semantic similarity between user evaluations by constructing a hierarchical structure tree of an ontology concept, and performing partial grading, prediction and filling on the user evaluations of which the similarity is greater than a similarity threshold in a sparse matrix; and on the basis of the predicted scoring matrix, performing dimensionality reduction and decomposition based on a matrix decomposition theory, and further performing scoring prediction on the user evaluation less than or equal to the similarity threshold value, so that the prediction filling of the scoring missing value in the sparse matrix is realized.
2. The power grid user evaluation sparse matrix scoring prediction method according to claim 1, is characterized by specifically comprising the following steps:
step 1, classifying user evaluation into a scored concept instance set V according to user scoresmAnd a set W of scored concept instances to be predictedm
Step 2, traversing the ontology hierarchical tree to obtain the depth (I) of the concept instancei),Ii∈Vm,depth(Ij),Ij∈WmAnd its smallest common ancestor node lso (I)i,Ij) Calculating the user evaluation similarity;
step 3, obtaining VmSubset V with user evaluation similarity greater than similarity threshold epsilonm', evaluation of the user to be predicted Ij∈WmCarrying out score prediction;
step 4, VmWithout any element and IjThe similarity is larger than a threshold value epsilon, access WmSkipping to the next element in the step 2;
and 5, taking the score matrix predicted by the semantic similarity method as an input matrix, reducing the dimension of the score matrix according to a matrix decomposition method, decomposing the score matrix into a low-order approximate matrix which is the best approximate to the original matrix, calculating a loss function, performing iterative calculation to obtain element values of the two matrixes after the score matrix is decomposed, and finally obtaining the evaluation score value of the user to be predicted in the score matrix.
3. The power grid user evaluation sparse matrix scoring prediction method according to claim 2, characterized in that: the method for calculating the user evaluation similarity in the step 2 comprises the following steps: the shorter the distance between two concept instances, the higher the similarity between the two concepts, and vice versa, the similarity formula is:
Figure FDA0003188298400000021
in the formula: depth represents the shortest path from the node to the root node, lso represents the minimum common ancestor node of the two instances, len represents the shortest path between the two nodes; as can be seen from equation (1), the similarity between concepts increases with increasing depth of the minimum common ancestor node, and, when I isi=IjWhen, sim (I)i,Ij)=1。
4. The power grid user evaluation sparse matrix scoring prediction method according to claim 2, characterized in that: step 3 obtaining VmSubset V with user evaluation similarity greater than similarity threshold epsilonm', evaluation of the user to be predicted Ij∈WmThe formula for making the score prediction is:
Figure FDA0003188298400000022
5. the power grid user evaluation sparse matrix scoring prediction method according to claim 4, characterized in that: the specific method for predicting the score is as follows:
let the user rating set be I ═ I (I)1,...,Ii,Ij,...,In) ε is a threshold constant, if IiAnd IjIf the semantic similarity of (1) is greater than the threshold value, then IiAnd IjAre similar between them; otherwise IiAnd IjAre dissimilar;
let U ═ be (U) set as user set1,...,Um,Un,...,Uk) User UmThe evaluation set of the scored users is VmThe set of unscored user ratings is Wm,Ii、IjAre respectively a set Vm、WmAny one of the elements of (1), RmiIs a user UmScored user ratings IiEpsilon is a threshold constant;
user UmRating I to the userjScore prediction of (3) RmjIs as follows.
Figure FDA0003188298400000031
And (3) proving that: hypothesis scored set VmThe method comprises t user evaluations and user evaluation I to be evaluatedjTwo with similarity greater than threshold epsilon are respectively Ip,IqAccording to the user evaluation Ip,IqThe predictive scoring formula of (1):
Figure FDA0003188298400000032
comprehensive consideration of similar user evaluation Ip,IqUser evaluation with similarity IjThe similarity weight of (c) is given by:
Figure FDA0003188298400000033
according to a known score Rmp,RmqThe similarity weight can be given by the formula:
Figure FDA0003188298400000034
further extend to UmScored set VmComprises more than one user evaluation I to be evaluatedjWhen similar users rate, i.e.
Figure FDA0003188298400000035
All have a score for evaluating the user to be predicted, and according to the semantic similarity prediction method, the user UmRating I to the userjScore prediction of (3) RmjAnd (4) calculating a formula.
6. The power grid user evaluation sparse matrix scoring prediction method according to claim 2, characterized in that: step 5, taking the score matrix predicted by the semantic similarity method as an input matrix, reducing the dimension of the score matrix according to a matrix decomposition method, decomposing the score matrix into a low-order approximate matrix which is the best approximate to the original matrix, and calculating a loss function by the method comprising the following steps: setting a scoring matrix predicted by a semantic similarity method as an m multiplied by n matrix R, and for a sparse matrix R, applying a matrix decomposition method to predict a matrix missing value, wherein the matrix R can be decomposed into an m multiplied by k dimensional U matrix and a k multiplied by n dimensional V matrix; symbol
Figure FDA0003188298400000041
For the approximate scoring matrix after prediction of matrix R:
Figure FDA0003188298400000042
the matrix U is the relation between m users and k subjects, the matrix V is the relation between k subjects and n user evaluations, the subject k is a parameter based on specific user evaluations, the ith row of the scoring matrix is approximated, and the element value of the jth column is as follows:
Figure FDA0003188298400000043
assuming the loss function of the approximate scoring matrix as the sum of the squares of the actual scores and the approximate scores, the loss function of the score prediction based on matrix decomposition is expressed as:
Figure FDA0003188298400000044
7. the power grid user evaluation sparse matrix scoring prediction method according to claim 5, wherein the power grid user evaluation sparse matrix scoring prediction method comprises the following steps: step 5, the method for obtaining each element value of the two matrixes after the decomposition of the scoring matrix by iterative computation and finally obtaining the user evaluation scoring value to be predicted in the scoring matrix comprises the following steps:
and respectively solving partial derivatives of the loss functions to obtain:
Figure FDA0003188298400000051
Figure FDA0003188298400000052
advancing along the fastest descending direction based on a gradient descent optimization algorithm, wherein the symbol alpha is a learning rate:
Figure FDA0003188298400000053
to prevent overfitting of the scoring matrix, a regularization term beta (| | u) is addedi||2+||vk||2) Where β is a regularization parameter, yielding:
ui,k=ui,k+α(2Ei,jvk,j-βui,k)
vk,j=vk,j+α(2Ei,jui,k-βvk,j) (12)
after the matrix U and V is solved, the prediction scoring formula of the user i on the item j is as follows:
u(i,1)*v(1,j)+u(i,2)*v(2,j)+…+u(i,k)*v(k,j) (13)。
8. the power grid user evaluation sparse matrix scoring prediction method according to claim 2, characterized in that: the method also comprises the step of verifying the feasibility and the effectiveness of the power grid user evaluation sparse matrix scoring prediction method, wherein the verification method specifically comprises the following steps: the algorithm is realized by C + + language under QT7.4.7 programming environment; data is from a dataset collected by the university of minnesota state computer science group research group for collaborative filtering algorithms; the sparsity of the data set is 100000/1682 multiplied by 943 ≈ 93.7%, firstly, the whole data set is subjected to randomized shuffling operation, then, experimental data are averagely divided into 5 mutually disjoint sub data sets, and the data proportion of the training set to the testing set is 4: 1; the scale values of the distribution of score 1 to score 5 for the 5 data sets are shown in table 1.
TABLE 1 score distribution ratio
Figure FDA0003188298400000061
The difference of the data distribution ratio of the data set 1 with the score of 5 and the data set 3 with the score of 5 is 1.87%, the difference is the largest in the whole data set, and the differences of the scores of the rest data sets are smaller, so that the data set 1 is not used as a test set, and 1 of the data sets from 2 to 5 is selected as the test set to be tested;
in the experiment, the times of classifying and constructing the ontology layer according to data are recorded as hierarchy tree, the semantic similarity threshold is epsilon, the characteristic number or dimension of matrix decomposition is F, the learning rate is alpha, the regularization parameter is beta, and the related parameter settings are shown in Table 2:
table 2 experimental parameter settings
Figure FDA0003188298400000062
In order to verify the performance of the algorithm, a score prediction algorithm based on semantic similarity, a score prediction algorithm based on sparse matrix singular value decomposition and a prediction algorithm based on random number filling missing values and then matrix decomposition are respectively operated on a data set and are subjected to comparative analysis, and an operation result is obtained by adjusting a similarity threshold value and iteration times and is subjected to statistical analysis;
adopting the average absolute deviation MAE as a measurement standard; MAE calculates the predicted user score and the actual user scoreThe accuracy of the prediction of the deviation measure between; assume that the scores of the N predicted projects are represented by a vector of { p } in the algorithm1,p2,…,pnThe corresponding actual user evaluation set is r1,r2,…rnAnd then the MAE calculation formula is as follows:
Figure FDA0003188298400000071
the similarity threshold is set to 0.75;
the validation method of the effectiveness comprises the following steps: performing SVD decomposition directly based on the sparse matrix to perform a prediction filling method, and using random numbers in a range of 1-5; the algorithm for pre-filling the sparse matrix and then performing matrix decomposition is compared with the result of fluctuation of the algorithm along with the preserved dimension change after matrix decomposition; under the condition of considering the preservation of the attribute of the original scoring data set, the similarity threshold value in the scoring prediction process is adjusted to partially pre-fill the items with higher similarity, richer source data information is provided for the original sparse matrix of matrix decomposition, and the MAE value of the algorithm is smaller than that of the SVD decomposition algorithm under the condition that the similarity threshold value is set to be 0.75.
CN202110868792.4A 2021-07-30 2021-07-30 Power grid user evaluation sparse matrix scoring prediction method Pending CN113537619A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110868792.4A CN113537619A (en) 2021-07-30 2021-07-30 Power grid user evaluation sparse matrix scoring prediction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110868792.4A CN113537619A (en) 2021-07-30 2021-07-30 Power grid user evaluation sparse matrix scoring prediction method

Publications (1)

Publication Number Publication Date
CN113537619A true CN113537619A (en) 2021-10-22

Family

ID=78089871

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110868792.4A Pending CN113537619A (en) 2021-07-30 2021-07-30 Power grid user evaluation sparse matrix scoring prediction method

Country Status (1)

Country Link
CN (1) CN113537619A (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109740064A (en) * 2019-01-18 2019-05-10 北京化工大学 A kind of CF recommended method of fusion matrix decomposition and excavation user items information

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109740064A (en) * 2019-01-18 2019-05-10 北京化工大学 A kind of CF recommended method of fusion matrix decomposition and excavation user items information

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王阳等: "融合语义相似度与矩阵分解的评分预测算法", 《计算机应用》 *
郭明: "一种基于联合聚类的协同过滤推荐算法研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *

Similar Documents

Publication Publication Date Title
Luo Network text sentiment analysis method combining LDA text representation and GRU-CNN
CN109299396B (en) Convolutional neural network collaborative filtering recommendation method and system fusing attention model
Zhang et al. A recommendation model based on deep neural network
Zamani et al. Neural query performance prediction using weak supervision from multiple signals
Teh et al. Indian buffet processes with power-law behavior
CN111797321A (en) Personalized knowledge recommendation method and system for different scenes
CN112232087A (en) Transformer-based specific aspect emotion analysis method of multi-granularity attention model
Wu et al. Hypergraph collaborative network on vertices and hyperedges
CN112100439B (en) Recommendation method based on dependency embedding and neural attention network
Yildiz et al. Improving word embedding quality with innovative automated approaches to hyperparameters
Yildirim A novel grid-based many-objective swarm intelligence approach for sentiment analysis in social media
Raudhatunnisa et al. Performance Comparison of Hot-Deck Imputation, K-Nearest Neighbor Imputation, and Predictive Mean Matching in Missing Value Handling, Case Study: March 2019 SUSENAS Kor Dataset
JP2005078240A (en) Method for extracting knowledge by data mining
He et al. Knowledge base completion using matrix factorization
CN112734510B (en) Commodity recommendation method based on fusion improvement fuzzy clustering and interest attenuation
Zhang et al. Probabilistic matrix factorization recommendation of self-attention mechanism convolutional neural networks with item auxiliary information
CN113537619A (en) Power grid user evaluation sparse matrix scoring prediction method
Wu et al. Graph-based query strategies for active learning
Bahrkazemi et al. A strategy to estimate the optimal low-rank in incremental SVD-based algorithms for recommender systems
Rumbut et al. Topic modeling for systematic review of visual analytics in incomplete longitudinal behavioral trial data
Pei [Retracted] Construction of a Legal System of Corporate Social Responsibility Based on Big Data Analysis Technology
Priyati et al. The comparison study of matrix factorization on collaborative filtering recommender system
Nordström Unstructured pruning of pre-trained language models tuned for sentiment classification.
Wang Forecast model of TV show rating based on convolutional neural network
Singhal et al. Predicting Budget from Transportation Research Grant Description: An Exploratory Analysis of Text Mining and Machine Learning Techniques

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20211022

RJ01 Rejection of invention patent application after publication