CN116523597A

CN116523597A - E-commerce customization recommendation method for big data cloud entropy mining

Info

Publication number: CN116523597A
Application number: CN202310482719.2A
Authority: CN
Inventors: 肖刚
Original assignee: Individual
Current assignee: Individual
Priority date: 2023-05-02
Filing date: 2023-05-02
Publication date: 2023-08-01

Abstract

According to the electronic commerce customization recommendation method for big data cloud entropy mining, two improvement strategies are provided: firstly, a recommendation algorithm based on cloud entropy mining and predictive scoring is provided, the accuracy of similarity is improved through a cloud similarity calculation method, and the problem of data sparsity is effectively solved; secondly, a recommendation algorithm based on an adjacency set matrix decomposition model is provided, an adjacency set model is obtained by using registration records or archive information of a user, a singular value decomposition method allowing incremental updating by adding global offset is adopted, and an optimization solution of each parameter is obtained by using a random gradient descent method in an optimization theory, so that the problem of cold start of the user is better solved, and the accuracy of the recommendation algorithm is improved. The customized recommendation is based on different preferences and interests of users, different and accurate recommendation results are provided, the one-to-one sales model greatly improves the efficiency of the e-commerce platform, a great amount of time is saved for the users, and the user experience is better.

Description

E-commerce customization recommendation method for big data cloud entropy mining

Technical Field

The application relates to an online shopping big data customized recommendation method, in particular to an electronic commerce customized recommendation method for big data cloud entropy mining, and belongs to the technical field of electronic commerce platform commodity recommendation.

Background

On the current e-commerce websites, various commodities are in full view, people can buy the wanted commodities without going out, and the method saves a great amount of time for consumers, and is convenient, quick and efficient. Therefore, with popularization and huge potential value of the e-commerce platform, many manufacturers put their own commodities on the internet for display and sale in various ways, so that the number of commodities on the e-commerce platform is greatly increased in a certain time, the e-commerce scale is also larger and larger, and the related technologies of the current retrieval function and recommendation strategy are less in consideration of specific demands of consumers, so that it is more and more difficult for users to search for the commodities of interest in a short time, and much time is required to browse a large amount of interesting commodity information to find the satisfied commodities.

Along with the continuous development of related data mining technologies, a solution is provided for the customized commodity recommendation service of users, and related data of commodities browsed and purchased by different users on the Internet is collected and analyzed, so that the information actually useful for merchants and consumers is mined, the commodity searching efficiency of the users is increased, a certain economic benefit is increased for the merchants, a new cooperation mode is provided for the users and the merchants, the efficiency of the electronic commerce is improved, and the sales of the commodities of enterprises is promoted.

The customized marketing is a marketing mode which is developed quickly and popular at present, and compared with the traditional marketing mode, the customized marketing mode has more consumer pertinence, provides different recommendation results for different objects based on different preference conditions and interest characteristics, and forms a one-to-one marketing model for consumers. And the recommended service function is an indispensable part in the customized sales process of the e-commerce platform. Different internet users can find commodities suitable for themselves in a short time according to own habits, interests or hobbies, the use efficiency of the electronic commerce platform is improved, a great amount of time is saved for users, the user experience is better, and merchants benefit from the method.

However, at present, under different implementation backgrounds, the implementation difficulty of customized recommendation is high, a plurality of problems still face, the recommendation system and algorithm in the prior art still have great limitations, the continuously growing and changing demand characteristics of consumers cannot be well met, the problem of messy, huge and disordered data in the whole electronic commerce site is needed to be solved, and a commodity presentation meeting the preference characteristics of different consumers can be provided for different consumers, so that a method with good practicability is provided.

The e-commerce customization recommendation method in the prior art mainly has the following problems: (1) The use efficiency is low, most of the current e-commerce platforms do not establish an effective and reliable user preference model, and a proper customized recommendation algorithm is designed based on the model; (2) The degree of intellectualization is low, and the recommendation service in the prior art depends on the e-commerce behavior record of the consumer. Therefore, before the consumer confirms the commodity to be purchased, the consumer needs to have related operation records to obtain the recommendation result given by the recommendation system, and the commodity possibly interested by the user cannot be inferred in advance and quickly according to the personal information of the user, so that the intelligent degree of the whole system is not high; (3) The duration is not long enough, and the recommendation system requests operations based on the data of the user in a certain session, and less uses the historical behavior record of the consumer.

Along with the increasing number and variety of commodities of an electronic commerce platform, the data information quantity generated by the whole electronic commerce system is also more and more huge, and consumers want to find the commodity which is most suitable for own will from the vast number of commodity information in a short time. If a sufficient and feasible recommendation system is used as a support, an accurate commodity recommendation result set is provided for consumers, and users can find required commodities in a short time, so that the user access amount in unit time is reduced, and the problem of network congestion is solved to a certain extent. The problems faced by the prior art e-commerce recommendations are the following: (1) the accuracy of the recommendation system is not high enough: the service function in the prior art can train and learn similar objects according to clicking or transaction records of users, and if the types of commodities browsed by consumers are large and the data volume is large, a recommendation system can hardly systematically and comprehensively analyze the data and give out accurate recommendation results. (2) the real-time performance of the recommendation system is not good enough: in order to ensure the accuracy of the recommendation system, the prior art introduces new concepts and designs more complex recommendation algorithms, which in turn brings new challenges to the processing capacity of the computer, and the number of users of the e-commerce platform is increased, and at the same time, providing customized services for all users inevitably reduces the real-time performance of the recommendation system, and over time, the demands and interests of the users are rapidly changing, which brings great challenges to the recommendation system. (3) mode and function of the recommendation system are single: the implementation strategy of the recommendation function in the prior art is just a single tool capable of providing a certain recommendation function, and cannot well adapt to the consumption conditions of increasing complexity and rapid change, and in the large background of multi-category in the field and the complexity of users, a recommendation system should have a corresponding strategy to meet various requirements in different backgrounds. (4) poor scalability: the actual electronic commerce platform has large data volume, new users and commodities are added in all the time, and new links, such as browsing, clicking, collecting, comparing, purchasing and commenting, are continuously generated between the new users and the new commodities, so that data information in the whole system is dynamically changed all the time, and a great deal of data processing and expandable recommendation technology searching are needed. (5) user privacy protection issues are prominent: the current recommendation system has very limited protection for the privacy of users, and cannot meet the increasing privacy protection requirements of consumers. The recommendation system needs certain information and historical data of the user as support, if the recommendation system cannot well protect privacy of the consumer, the consumer lacks necessary security, the system is difficult to acquire accurate personal information, and recommendation quality and accuracy of the recommendation system are directly affected.

In addition to the above problems and drawbacks, the problems and key technical difficulties to be solved by the present application include:

(1) In the prior art, a plurality of recommendation algorithms are based on a scoring matrix, have good recommendation effect only in a certain specific scene, are difficult to overcome the natural defects of the recommendation algorithms, and for an oversized electronic commerce platform, the sparsity of the commodity scoring matrix of a user is large, the user only evaluates commodities with extremely few numbers, so that the similarity calculation reliability is greatly influenced, the accuracy of a recommendation system is reduced, the data sparsity of the matrix becomes one of the problems which must be solved by the conventional recommendation algorithm, and aiming at the data sparsity problem, the prior art lacks a high-quality similarity calculation method, cannot overcome the defects of the conventional measurement method, causes inaccurate recommendation results and has low processing efficiency facing massive electronic commerce data.

(2) At present, the commodity type of the ultra-large electronic commerce platform reaches millions, the number of users is more than tens of millions, but each user cannot give an evaluation value to each commodity, so that a scoring matrix of the user-commodity is extremely sparse, a traditional similarity calculation method generates larger errors, in the traditional measurement method, cosine similarity generally assigns a value of 0 to commodity evaluation which is not given by the user, but in actual conditions, the evaluated commodity which is not given by the user is not necessarily the lowest in user preference degree, and the user is likely to not notice the commodity, and is likely to not give the evaluation value in time after purchasing; the modified vector cosine mode or the pearson correlation also has similar problems, the very sparse user-commodity scoring matrix leads to the purchase of different users, the commodity giving reasonable evaluation values in time is very sparse, the similarity among users is common and low, the accuracy of nearest neighbors of a target object is greatly influenced, the electronic commerce customization recommendation speed is finally slow, the accuracy is too low, sometimes even a countermeasures is played, and very bad experience is brought to the users.

(3) The recommendation algorithm of the collaborative filtering mode in the prior art calculates the similarity between users or commodities by collecting and analyzing historical e-commerce behavior records of the users and certain purchase data of the commodities, and further obtains the nearest neighbor set of the users or the commodities through parameter adjustment. Therefore, each platform needs to provide better recommendation service and first wins the favour of new users, so that the cold start problem and the data sparsity problem of the matrix have to be solved, a more perfect recommendation system is designed, and better recommendation service is provided for users with different conditions.

Disclosure of Invention

Aiming at the defects of the collaborative filtering recommendation mode in the prior art, the application creatively provides two improvement strategies: firstly, a recommendation algorithm based on cloud entropy mining and predictive scoring is provided, the defects of the traditional similarity calculation method are overcome, the accuracy of the similarity is improved through the cloud similarity calculation method, and the problem of data sparsity is effectively solved; secondly, a recommendation algorithm based on an adjacency set matrix decomposition model is provided, an adjacency set model is obtained by using registration records or archive information of a user, a singular value decomposition method allowing incremental updating by adding global offset is adopted, and an optimization solution of each parameter is obtained by using a random gradient descent method in an optimization theory, so that the problem of cold start of the user is better solved, and the accuracy of the recommendation algorithm is improved. The customized recommendation is based on different preferences and interests of users, different and accurate recommendation results are provided, a one-to-one sales model is formed for consumers, the use efficiency of an electronic commerce platform is greatly improved, a large amount of time is saved for the users, the user experience is better, and merchants benefit from the customized recommendation.

In order to achieve the technical effects, the technical scheme adopted by the application is as follows:

The electronic commerce customization recommendation method for big data cloud entropy mining fuses two improvement strategies: adopting a cloud entropy mining method to calculate the similarity of each object so as to obtain a more accurate nearest neighbor set; adopting a matrix decomposition model based on an adjacency set to solve the problems of cold start and data sparsity;

1) Establishing an e-commerce recommendation algorithm based on cloud entropy mining and predictive scoring: firstly, improving a similarity calculation method, calculating information by using a scoring value of a commodity set by a user, calculating and summarizing the overall characteristics of a target object by coarsely considering the similarity of the object on the overall level, improving the accuracy of the similarity by a cloud similarity calculation method, and solving the problem of data sparsity; secondly, predicting the score value of a target commodity by a user based on the predicted score recommendation of the cloud entropy mining model, increasing the number of commodities purchased and evaluated by different users together, improving on the basis of a collaborative filtering algorithm to obtain an improved customized recommendation algorithm, calculating the similarity of the users based on cloud entropy mining, constructing a nearest neighbor set of the users, acquiring the evaluation value of the target commodity by the users in the nearest neighbor set, weighting to obtain the evaluation predicted value of the target commodity by the target user, finally sequencing the predicted values, taking a plurality of commodities with the maximum predicted value as a part of recommendation results, and adding the commodities into a recommendation list for the users to select;

2) Establishing a recommendation method based on a user adjacency set matrix decomposition model: constructing an adjacency set model and user adjacency set matrix decomposition customized recommendation, constructing a nearest neighbor set of a user through user registration data or archive information, adopting a singular value decomposition method of adding global offset to allow incremental updating, performing dimension reduction on a user-commodity scoring matrix, solving the data sparsity problem of the matrix, and solving the optimal solution of each parameter by using an optimized random gradient descent method to solve the problem of cold starting of the user.

Preferably, the improved similarity calculation method: judging the preference degree of the user on the commodity according to the evaluation value of the user on the commodity, if the electric commodity gives the user five levels of very offensive, general, favorite and very favorite to the evaluation level of the commodity, the corresponding evaluation values are 1, 2, 3, 4 and 5, and c is set ₁ 、c ₂ 、c ₃ 、c ₄ 、c ₅ Respectively represent the evaluation times of the five stages, U _x A vector indicating the number of evaluations;

based on reverse cloud entropy reverse mining, qualitative knowledge of evaluation characteristics of different users on each commodity is reflected by using an evaluation frequency vector of each user, and an evaluation characteristic vector of the user-commodity is calculated and obtained, wherein Ex represents average preference of a target user on all commodities, en represents concentration of the user on commodity evaluation, and He represents stability of En;

Measurement mode of cloud similarity: if F _i 、F _j Digital feature vectors respectively representing clouds i, j, and F _i ＝(Ex _i ，En _i ，He _i )，F＝(Ex _j ，En _j ，He _j ) Vector F _i 、F _j The cosine value of the included angle is the similarity of the clouds i and j, and the calculation formula is as follows:

evaluation feature vector F corresponding to user _A 、F _B 、F _C 、F _D Calculating to obtain a similarity matrix of the user according to the cloud similarity calculation formula,

the three dimensions of the user evaluation feature vector are the average preference of the user on the commodity, the concentration degree of the user score and the dispersion degree of the evaluation value, the similarity of the evaluation feature vector is calculated to contain each item of information of the user evaluation, the method is suitable for the data sparseness condition of a matrix, the integral feature of the target object is calculated and induced by using the score value calculation information of the user on the commodity set, and the strict matching condition of the similarity calculation on the target object is avoided.

Preferably, the user's score for the good is predicted: firstly, solving the similarity of a target object by adopting cloud similarity, then predicting the scoring value of a user for the target commodity according to the evaluating value of the user for the similar commodity, increasing the number of commodities purchased and evaluated by different users together, and avoiding the problem caused by matrix sparsity;

recording I as the evaluation times vector of the target object, c _x The method comprises the steps of representing the evaluation times of a user set on the commodity x level, classifying commodity evaluation levels into 1 to 5 five levels, reversely mining on the basis of reverse cloud entropy, obtaining three digital characteristic values of cloud entropy mining through an evaluation time vector of a target commodity, forming characteristic vectors of commodity evaluation, and marking as F= (Ex, en, he), wherein Ex is an expected value of cloud entropy mining and represents average preference of the target user on all commodities; en is the entropy of cloud entropy miningRepresenting the concentration degree of the commodity evaluation by the user; he is the super entropy of cloud entropy mining, represents the stability of En, calculates the similarity of commodities based on a cloud similarity calculation method according to the evaluation feature vector of the commodities, predicts the scoring value of the commodity which is not yet evaluated by the target user,

preferably, the specific steps of predicting the user's score for the commodity are as follows:

the first step: inputting scoring matrix R of user commodity _m×n ；

And a second step of: from R _m×n Obtaining the evaluation number vector I of each commodity _i (c ₁ 、c ₂ 、c ₃ 、c ₄ 、c ₅ ) Reverse cloud entropy reverse mining is adopted to calculate and obtain the evaluation feature vector F of each commodity _i ＝(Ex _i ，En _i ，He _i ) Wherein i is the interval [1, n ]]An integer representing the id of n users or features;

and a third step of: and obtaining the similarity of the commodities i and j according to a cloud similarity calculation formula, wherein the calculation formula is as follows:

Fourth step: finding out the top k commodities with larger sim (i, j) value to form the nearest neighbor set N of the target commodity _i ＝{i ₁ ,i ₂ ，…，i _k N, where N _i Commodity i itself, not containing sim (i, j) value 1, and set N _i The element value of (2) decreases as the subscript increases;

fifth step: from N _i The estimated value of the target user u on the commodity i is obtained, and the calculation formula is as follows:

wherein r is _uj The existing evaluation value of the user u on the commodity j is shown, and sim (i, j) is the similarity of the commodities i, j.

Preferably, a recommendation result set is generated: the cloud similarity calculation method is adopted to calculate the similarity of the objects, the improvement is carried out on the basis of the collaborative filtering algorithm, and an improved customized recommendation algorithm is provided: the similarity calculation method based on cloud entropy mining calculates the similarity of users, a nearest neighbor set of the users is constructed, the evaluation value of the users on the target commodity in the nearest neighbor set is obtained, the evaluation predicted value of the target user on the commodity is obtained through weighting, finally, the predicted values are ordered, and a plurality of commodities with the largest predicted value are used as a part of the recommendation result and added into a recommendation list for the users to select.

Preferably, the basic flow of the cloud entropy mining and user prediction scoring recommendation algorithm is as follows: firstly, collecting scoring data of a user, and processing the data to obtain a commodity scoring matrix R of the user; then calculating the evaluation condition in R to obtain the grading times vector of each commodity; then, calculating the evaluation feature vector of each commodity by a calculation method of reverse cloud entropy mining digital features; combining a cloud similarity calculation method to obtain the similarity of the target commodity and other commodities, and sequencing to obtain a nearest neighbor set; predicting the grading value of the target commodity according to the evaluation condition of the neighboring commodity in the nearest neighbor set, and filling a user-commodity grading matrix; then calculating the grading condition of the target user according to the complete grading matrix, and calculating to obtain the evaluation feature vector of the user; calculating the similarity between users by the feature vector to obtain the nearest neighbor set of the users; and finally, calculating the estimated scoring value of the target user according to the scoring condition of the neighbor users in the nearest neighbor set, and presenting a plurality of commodities with higher predicted values to the user as recommendation results, thereby completing the whole recommendation process.

Preferably, an adjacency set model is built: when a user logs in an E-commerce platform, the system searches and constructs a nearest neighbor set according to the file information of the user, so that the collaborative filtering algorithm in the prior art is improved, and a calculation formula of the estimated score of the user u to the commodity i is given first:

wherein b _ui Representing global bias of user u, N (u, k) representing nearest neighbor set composed of k neighbors of user u, H (i) representing set composed of all users who purchased commodity i, v representing any user, w _uv A weight parameter representing the user v for the target user u, r _vi Representing the existing evaluation value of the corresponding commodity i by the neighboring user v.

To obtain an accurate evaluation prediction value, b needs to be obtained _ui And w _uv And determining the nearest neighbor set N (u, k), b of the target user u _ui The calculation formula of (2) is as follows:

b _ui ＝b _u +b _i +mu type 5

Wherein b _u The bias of the user object is represented, namely, the factor value of the user u which has no relation with the target commodity in the history evaluation habit, b represents the bias of the commodity object, namely, the factor value of the commodity which has no relation with the user in the obtained evaluation, mu represents the average of all evaluation values of a commodity object in the training set, and the commodity scoring matrix of the user is R _ui ，r _ui The actual evaluation value of the commodity i by the user u is represented, Representing an evaluation prediction value provided by the algorithm for the user U on the commodity i, wherein U= { U _l ,u ₂ ,…,u _n The user set consisting of n users is represented by i= { I ₁ ,i ₂ ，…,i _m And (c) represents a commodity set consisting of m commodities.

Preferably, w _uv And N (u, k) acquisition modes:

(1) Constructing a nearest neighbor set N (u, k): assuming that the feature vector F of each user is n-dimensional, and that each element of F corresponds to gender, age, occupation, academic, and income profile information of the user, it is expressed as: f= (gender, age, occupation, academic, income), and then digitally characterizing various information of the user, wherein the gender { male, female } is represented as {0,1}; age interval (0, 100)]With 1 representing the time interval (0, 14]The users within 2 indicate the age of (14, 19]The users within 3 represent the ages at (19, 22]Users within, 4 represents ageAt (22, 26]The users in the range divide the age interval according to the number of people, and so on; other types of user information are also expressed in a numerical expression mode, namely, the influence degree of different value ranges of each dimension on the score value difference is respectively obtained, namely, the absolute difference of different value users of the same dimension in the data set on the average score is analyzed, and a score value difference matrix D is formed _k (f _i ,f _j ) The natural stone obtains the difference set D between every two users according to the file information of the users, and then according to D _k The absolute average difference value caused by scoring is weighted and summed by different value ranges of s in the age and the academic dimension to obtain a sim (u, v) value, and finally k users with the largest similarity are selected from sim (u, v) value sets to form a nearest neighbor set N (u, k) of a target user u, wherein the similarity is calculated as follows:

wherein F is _i ，F′ _i Respectively representing different digital characterization values of the same dimension in the feature vector F corresponding to the users u and v, and belonging to a difference set D, namely that the dimension is different, if no difference exists, D _i Wherein max (r) represents the upper score limit and n is the dimension of the user feature vector;

(2) Calculating the w of the neighbor _uv Value: calculating the similarity between users by using the file information of the newly registered users, constructing a nearest neighbor set of the target object, calculating the evaluation information of the neighbor users, and predicting the evaluation value of the target user;

before predicting the evaluation value of the user, judging the basic property of the user, and if the target user u has available historical behavior record, obtaining a parameter w by using a random gradient descent method _uv Is a value of (2); if the target user u is a new user and there is no history of behavior available, then the sim (u, v) value found in (1) is used as the weight w _uv Is a value of (2);

in the calculation process, the existing user behavior record is utilized to continuously adjust and obtain the weight value w with the minimum error _uv First, defineArtificial objective function:

wherein the meaning of each letter is defined, a factor in the formulaPreventing the training result from being over-fitted;

solving for the variables bu, bi, and w _uv And then the minimum value of the formula 7 is obtained by iteration through a random gradient descent method, and the optimal parameter value is obtained through an iterative optimization algorithm, as shown in the formula 8:

b _u ←b _u +α(e _ui -λb _u )

b _i ←b _i +α(e _ui -λb _i )

where α is the learning rate, e _ui Predicting error for evaluation valueLambda is a regularization parameter, and the continuation or termination of the iterative process is determined according to the average absolute error and the error critical value.

Preferably, the user adjacency set matrix factorizes the customized recommendation:

decomposing a commodity scoring matrix R of a user into products of two simple matrices P, Q by singular value decomposition, wherein the expression is R=P ^T * Q, adopting global offset to update increment iteration while decomposing, and extracting potential feature vectors with the dimension f as prediction of scoring matrix missing values while reducing algorithm space complexity;

decomposing a multidimensional matrix R into P ^T The product form of the two simple matrixes Q and Q is adopted to simplify the complexity of data and adopt the loss function of matrix decomposition L (u, i) has the formula:

wherein r is _ui Representing the evaluation value, p, of the commodity i by the user u in the scoring matrix R _u Feature vector, q, representing target user u in user set P _i Feature vector representing target commodity i in commodity set Q, factor λ (|p) _v || ² +||q _i || ² ) For preventing overfitting;

the minimum value of L (u, i) is obtained by a random gradient descent method, and the iteration process is as follows:

p _uk ←p _uk +α(e _ui *q _ik -λp _uk )

q _ik ←q _ik +α(e _ui *p _uk -λ _qik ) 10. The method of the invention

At the beginning of the training process, p _u And q _i Initializing, i.e. filling p with random values _u And q _i Is a f-dimensional feature vector of (2), and the evaluation value r of the target user u _ui Calculation ofThen, the error of the estimated value and the actual value is calculated>Updating each dimension of the feature vector, and obtaining a decomposed simple matrix P, Q after finishing iterative updating for a specific number of times;

and (3) combining the global bias of the target object to obtain a calculation formula of the predicted scoring value of the user:

the minimization error formula has the following deduction process:

the formula becomes after adding the overfitting prevention factor:

by random gradient descent method for w _uv 、p _u 、q _i 、b _u And b _i Carrying out iterative solution on parameters to obtain b of each user _u And p _u Value b of each commodity _i And q _i Value, and weight value w of nearest neighbor set user v to target user u _uv ；

Solving the iteration times Count of the parameter fixed algorithm, regularization parameters lambda and learning rate alpha, and the dimension f of the hidden characteristic vector and the number k of neighbors in the nearest neighbor set.

Compared with the prior art, the innovation point and the advantage of the application are that:

(1) Aiming at the defects of the collaborative filtering recommendation mode in the prior art, two improvement strategies are provided: firstly, a recommendation algorithm based on cloud entropy mining and predictive scoring is provided, the defects of the traditional similarity calculation method are overcome, the accuracy of the similarity is improved through the cloud similarity calculation method, the problem of data sparsity is effectively solved, the reliability and superiority of the algorithm are verified through an offline experiment, the similarity of each object is obtained through the cloud entropy mining method, and a more accurate nearest neighbor set is obtained; secondly, a recommendation algorithm based on an adjacency set matrix decomposition model is provided, the adjacency set model is obtained by using registration records or archive information of a user, a singular value decomposition method allowing incremental updating by adding global offset is adopted, and an optimization solution of each parameter is obtained by using a random gradient descent method in an optimization theory, so that the problem of cold start of the user is better solved, the accuracy of the recommendation algorithm is improved, the rationality and the high efficiency of the algorithm are verified through experimental data, and the problems of cold start and data sparsity are solved by adopting the adjacency set matrix decomposition model. The customized recommendation is based on different preferences and interests of users, different and accurate recommendation results are provided, a one-to-one sales model is formed for consumers, the use efficiency of an electronic commerce platform is greatly improved, a large amount of time is saved for the users, the user experience is better, and merchants benefit from the customized recommendation.

(2) The method creatively provides an e-commerce recommendation algorithm based on cloud entropy mining and predictive scoring, firstly, a similarity calculation method is purposefully improved, scoring value calculation information of a commodity set by a user is adopted, similarity of objects is considered in a coarse granularity on the whole level, overall characteristics of target objects are calculated and summarized, accuracy of the similarity is improved through a cloud similarity calculation method, and the problem of data sparsity is skillfully solved; and secondly, predicting the score value of the target commodity by the user based on the predicted score recommendation of the cloud entropy mining model, increasing the number of commodities purchased and evaluated by different users, improving on the basis of a collaborative filtering algorithm to obtain an improved customized recommendation algorithm, constructing a nearest neighbor set of the user, acquiring the evaluation value of the target commodity by the user in the nearest neighbor set, weighting to obtain the evaluation predicted value of the target commodity by the user, finally sequencing the predicted values, taking a plurality of commodities with the maximum predicted value as a part of a recommendation result, adding the commodity into a recommendation list, and improving the accuracy of similarity for user selection, so that the feature refinement and calculation pertinence of the user are stronger, the commodity recommendation efficiency is higher, and the pertinence is stronger and more accurate.

(3) The recommendation method based on the user adjacency set matrix decomposition model is creatively provided, the adjacency set model and the user adjacency set matrix decomposition customized recommendation are built, the nearest neighbor set of the user is built based on user registration data or archive information, the accuracy of a result provided by a new registered user is greatly improved by improving a recommendation algorithm based on the adjacency set model, the singular value decomposition method allowing incremental updating by adding global offset is adopted to reduce dimensions of a user-commodity scoring matrix, the data sparsity problem of the matrix is solved, the optimized solution of each parameter is obtained by using an optimized random gradient descent method, the cold starting problem of the user is effectively solved, the quality of the recommendation result is improved, the efficient management of massive electronic commerce data content is finally realized, the problems of messy, huge and unordered data in the whole electronic commerce site are solved, the commodity accurate presentation and recommendation meeting the preference characteristics of different consumers can be provided, and the method has good speed-up ratio and expansibility.

Drawings

FIG. 1 is a schematic of a scoring matrix for ten items for four users.

Fig. 2 is a schematic diagram of a similarity matrix obtained by calculating cloud similarity.

FIG. 3 is a flowchart of a predictive scoring recommendation flow based on a cloud entropy mining model.

FIG. 4 is a flowchart of an e-commerce recommendation method based on cloud entropy mining and user prediction scoring prediction.

Fig. 5 is a graph of a result of an e-commerce recommendation method experiment based on cloud entropy mining and predictive scoring.

FIG. 6 is a graph of two results of an E-commerce recommendation method experiment based on cloud entropy mining and predictive scoring

FIG. 7 is a schematic diagram of a recommendation flow based on a user adjacency set matrix factorization model.

FIG. 8 is a graph of results of a recommendation method experiment based on a user adjacency set matrix factorization model.

FIG. 9 is a graph of two results of a recommendation method experiment based on a user adjacency set matrix factorization model.

Detailed Description

The technical scheme of the electronic commerce customization recommendation method for big data cloud entropy mining provided by the application is further described below with reference to the accompanying drawings, so that the application can be better understood and implemented by a person skilled in the art.

How to help users to quickly find satisfactory commodities becomes a problem to be solved by many internet sites, and based on the background, the application provides a customized recommendation mode in an electronic commerce environment.

The e-commerce recommending mode in the prior art is to extract potential preference commodities according to behavior records of users and related technologies of Web data mining, and make recommendations. The collaborative filtering algorithm is a hotspot of the current e-commerce recommendation system, the algorithm calculates the similarity of each object by using a scoring record of a user and a similarity calculation method, and a plurality of objects with the highest similarity are obtained by arranging the objects to form a nearest neighbor set, and finally recommendation results are obtained from the nearest neighbor set. Therefore, solving the nearest neighbor set of the target object based on the similarity calculation method is a crucial step in the implementation process of the recommendation algorithm, and based on the above consideration and the shortcomings of the collaborative filtering recommendation mode in the prior art, two improvement strategies are provided in the application: adopting a cloud entropy mining method to calculate the similarity of each object so as to obtain a more accurate nearest neighbor set; and solving the problems of cold start and data sparsity by adopting a matrix decomposition model based on an adjacency set.

(1) The recommendation algorithm based on cloud entropy mining and predictive scoring is provided, the defects of the traditional similarity calculation method are overcome, the accuracy of the similarity is improved through the cloud similarity calculation method, the problem of data sparsity is effectively solved, and finally the reliability and the superiority of the algorithm are verified through an offline experiment.

(2) The recommendation algorithm based on the adjacency set matrix decomposition model is provided, the adjacency set model is obtained by using registration records or archive information of the user, a singular value decomposition method allowing incremental updating by adding global offset is adopted, and an optimization solution of each parameter is obtained by using a random gradient descent method in an optimization theory, so that the problem of cold start of the user is better solved, the accuracy of the recommendation algorithm is improved, and finally the rationality and the effectiveness of the algorithm are verified through experimental data.

1. E-commerce recommendation method based on cloud entropy mining and predictive scoring

In the prior art, a plurality of recommendation algorithms are based on a scoring matrix, have good recommendation effect only in a specific scene, and hardly overcome the natural defects of the recommendation algorithms, so that the sparsity of the commodity scoring matrix of a user is large, the user only evaluates the commodity with extremely few numbers, the similarity calculation reliability is greatly influenced, the accuracy of a recommendation system is reduced, and the data sparsity of the matrix becomes one of the problems which must be solved by the current recommendation algorithm. Aiming at the data sparsity problem, the similarity calculation method based on cloud entropy mining is adopted, the defects of the traditional measurement method are overcome, and an electronic commerce recommendation algorithm based on cloud entropy mining is provided, so that the recommendation process is more optimized. Experiments show that the recommendation algorithm combining cloud entropy mining and user evaluation prediction can obtain more accurate recommendation results than a general collaborative filtering algorithm. The proposal and the improvement of the algorithm are also one of the innovation points of the application.

Improved similarity calculation method

Judging the preference degree of the user on the commodity according to the evaluation value of the user on the commodity, if the electronic commerce gives the user five levels of very offensive, general, favorite and very favorite to the evaluation level of the commodity, the corresponding evaluation values are 1, 2, 3, 4 and 5, and figure 1 shows that A, B, C, D four users are p ₁ 、p ₂ 、p ₃ 、p ₄ 、p ₅ 、p ₆ 、p ₇ 、p ₈ 、p ₉ 、p ₁₀ Scoring matrix for ten items.

Let c ₁ 、c ₂ 、c ₃ 、c ₄ 、c ₅ Respectively represent the evaluation times of the five stages, U _x A vector indicating the number of evaluations, x being one of four users A, B, C, D, calculated as: u (U) _A ＝(6，4，0，0，0)、UB＝(0，0，0，5，5)、Uc＝(0，0，2，4，4)、UD＝(4，6，0，0，0)。

Based on reverse cloud entropy reverse mining, qualitative knowledge of evaluation characteristics of different users on various commodities is reflected by using evaluation frequency vectors of the various users, and evaluation characteristic vectors of the user-commodities are calculated and obtained, wherein Ex represents average preference of target users on all the commodities, en represents concentration of the users on commodity evaluation, and He represents stability of En.

Digital feature calculation based on cloud entropy mining obtains evaluation feature vectors F corresponding to four A, B, C, D users _A 、F _B 、F _C 、F _D The method comprises the following steps of: f (F) _A ＝(1.4000000，0.382138，0.601591)、F _B ＝(4.500000，0.383247，0.626657)、F _C ＝(4.200000，0.638558，0.802121)、F _D = (1.600000,0.382138,0.601591), the evaluation value of the commodity by the user A, D is low, and the score dispersion of the two users is the same; the user B, C has a higher rating for the commodity and the user C has a higher rating dispersion than the user B.

The evaluation frequency vector reflects the grading condition of a target user on a designated commodity set, the grading value of a specific commodity is not excessively concerned, the grading condition of different users on the same commodity subset determines the similarity between the two, and the similarity calculation method combined with cloud entropy mining is adopted and further improved in implementation details based on the strict matching defect of the traditional similarity calculation method in terms of measurement similarity.

evaluation feature vector F corresponding to four users _A 、F _B 、F _C 、F _D According to the cloud similarity calculation formula, a similarity matrix of the four users A, B, C, D is calculated and obtained, as shown in fig. 2.

The three dimensions of the user evaluation feature vector are the average preference of the user on the commodity, the concentration degree of the user score and the dispersion degree of the evaluation value, and the similarity of the evaluation feature vector is calculated to contain all items of information evaluated by the user.

Therefore, the method improves the defects of the traditional collaborative filtering recommendation algorithm, is relatively suitable for the data sparseness condition of the matrix, fully utilizes the scoring value calculation information of the commodity set by the user, calculates and generalizes the integral characteristics of the target object by considering the similarity of the object with coarse granularity on the integral level, and avoids the strict matching condition of the traditional similarity calculation method to the target object.

(II) predictive scoring recommendation based on cloud entropy mining model

At present, the commodity type of the ultra-large electronic commerce platform reaches millions, the number of users is more than tens of millions, but each user cannot give an evaluation value to each commodity, so that a scoring matrix of the user-commodity is extremely sparse, and a large error is generated in a traditional similarity calculation method, for example, in the traditional measurement method, cosine similarity generally assigns a value of 0 to commodity evaluation which is not given by the user, but in actual situations, the evaluated commodity which is not given by the user is not necessarily the lowest in user preference degree, and the user is likely to not notice the commodity, and is also likely to not give the evaluation value in time after purchasing; similar problems exist in the modified vector cosine mode or the pearson correlation, the very sparse user-commodity scoring matrix leads to the purchase of different users, commodities giving reasonable evaluation values in time are very sparse, the similarity among users is low, and the accuracy of nearest neighbors of a target object is greatly affected.

1. Predicting user scoring of goods

Firstly, solving the similarity of a target object by adopting cloud similarity, then predicting the scoring value of a user for the target commodity according to the evaluating value of the user for the similar commodity, increasing the number of commodities purchased and evaluated by different users together, and avoiding the problem caused by matrix sparsity.

Recording I as the evaluation times vector of the target object, c _x The method comprises the steps of representing the evaluation times of a user set on the commodity x level, classifying commodity evaluation levels into 1 to 5 five levels, reversely mining on the basis of reverse cloud entropy, obtaining three digital characteristic values of cloud entropy mining through an evaluation time vector of a target commodity, forming characteristic vectors of commodity evaluation, and marking as F= (Ex, en, he), wherein Ex is an expected value of cloud entropy mining and represents average preference of the target user on all commodities; en is cloud entropyThe entropy of mining represents the concentration of the user on commodity evaluation; he is super entropy mined by cloud entropy, represents the stability of En, calculates the similarity of commodities based on a cloud similarity calculation method according to the evaluation feature vector of the commodities, and predicts the scoring value of the commodity which is not yet evaluated by a target user, and specifically comprises the following steps:

the first step: inputting scoring matrix R of user commodity _m×n ；

2. Generating a recommendation result set

The cloud similarity calculation method is adopted to calculate the similarity of the objects, improvement is made on the basis of the collaborative filtering algorithm, and an improved customized recommendation algorithm is provided: the similarity calculation method based on cloud entropy mining calculates the similarity of users, a nearest neighbor set of the users is constructed, the evaluation value of the users on the target commodity in the nearest neighbor set is obtained, the evaluation predicted value of the target user on the commodity is obtained through weighting, finally, the predicted values are ordered, and a plurality of commodities with the largest predicted value are used as a part of the recommendation result and added into a recommendation list for the users to select. The basic implementation steps of the algorithm are shown in fig. 3:

the basic flow based on cloud entropy mining and user prediction scoring recommendation algorithm is as follows: firstly, collecting scoring data of a user, and processing the data to obtain a commodity scoring matrix R of the user; then calculating the evaluation condition in R to obtain the grading times vector of each commodity; then, calculating the evaluation feature vector of each commodity by a calculation method of reverse cloud entropy mining digital features; combining a cloud similarity calculation method to obtain the similarity of the target commodity and other commodities, and sequencing to obtain a nearest neighbor set; predicting the grading value of the target commodity according to the evaluation condition of the neighboring commodity in the nearest neighbor set, and filling a user-commodity grading matrix; then calculating the grading condition of the target user according to the complete grading matrix, and calculating to obtain the evaluation feature vector of the user; calculating the similarity between users by the feature vector to obtain the nearest neighbor set of the users; and finally, calculating the estimated scoring value of the target user according to the scoring condition of the neighbor users in the nearest neighbor set, and presenting a plurality of commodities with higher predicted values to the user as recommendation results, thereby completing the whole recommendation process. The whole process of the algorithm is described by a flow chart for visual and clear representation, as shown in fig. 4.

3. Algorithm complexity analysis

The complexity analysis of the algorithm comprises two aspects of time complexity analysis and space complexity analysis, and a more applicable algorithm should calculate the optimal balance in the two dimensions so that the time complexity and the space complexity are better. The method comprises three main logic parts of commodity object similarity solving, user object similarity solving and similarity sorting.

(1) Similarity is found for commodity objects: the Item dimension traversal of the scoring matrix is traversed to find similarity values, which require similarity of other items to the target Item object, the entire Item set must be traversed, and for each Item, the subset of items left in the matrix is traversed once, the time complexity of traversing the scoring matrix is O (Num (Item)), and the time complexity of traversing Num (Item) times is O (Num (Item)). Thus, the time complexity of this part is O (Num (Item) ² )。

(2) User object similarity: the User dimension traversal of the scoring matrix is traversed to obtain similarity values, the similarity between other users and target User objects is required, the whole User set is required to be traversed, and for each User, the rest User subset in the matrix is traversed once, the time complexity of traversing the scoring matrix once is O (Num (User)), and the time complexity of traversing Num (User) times is O (Num (User)). Thus, the time complexity of this part is O (Num (User)).

(3) Similarity ordering: the similarity sorting part of the algorithm adopts a direct insertion sorting method, and the time complexity is O (Max (Num (Item) ² ，Num(User) ² ))。

In summary, the time complexity of the algorithm is O (Max (Num (Item) ² ，Num(User) ² ))。

The analysis of the spatial complexity takes into account the data storage part in the scoring matrix, which recommends an algorithm of the spatial complexity O (Num (User) ×num (Item)) for a scoring matrix of Num (User) ×num (Item).

(III) Experimental results and analysis

Experiment one: and verifying the influence of different parameters on experimental results, dividing the experiment into two aspects, adjusting the nearest neighbor number of the target object nearest neighbor set according to the size parameter of the target object nearest neighbor set, respectively solving the change condition of average absolute errors when the number of the commodity and the nearest neighbor of the user increases from 10 to 50 in sequence, averaging each result by 5 calculation results, and finally drawing the result into a table, wherein the experimental result is shown in figure 5, and UsNeNu, coNeNu respectively represents the number of the nearest neighbors of the user and the commodity.

Experiment II: comparing a recommendation algorithm based on cloud entropy mining and user predicted scoring with other common algorithms, firstly acquiring the same data set, dividing the data set into a training set and a test set according to the same proportion, setting the number of neighbors of a target commodity to be a variable value from 10 to 50, respectively calculating MAE values of the recommendation algorithm (ICCF) based on similarity of the proposed algorithm and other commodity categories, and a collaborative filtering algorithm (RPCF) based on predicted scoring, wherein each result is obtained by averaging 5 calculation results, finally drawing the result into a table and a graph, obtaining the result with reliability, and the experimental result is shown in a figure 6, wherein UsNeNu represents the nearest neighbor number of the user.

1. Analysis of experimental results

From the results of experiment one, it can be seen that: when UsNeNu is a fixed value, the value of MAE is reduced along with the increase of CoNeNu, namely the accuracy of the recommended result is higher and higher; when the CoNeNu is a constant value, the value of the MAE decreases with increasing UsNeNu, i.e. the accuracy of the recommended results is also increasing.

From the results of experiment two, it can be seen that: when the copenu is a fixed value, the MAE values of the three methods basically decrease along with the increase of the UsNeNu, the accuracy of the recommendation algorithm is increased, but the MAE value obtained after the usnenu=30 by the recommendation algorithm (ICCF) based on the commodity category similarity is increased, and the accuracy is reduced; the algorithm provided by the application is more accurate than the recommended results of other two algorithms; as can be seen from the single linear graph of the ICCF, the accuracy of the recommended result of the method is relatively easily affected by the number of nearest neighbors of the user, and when usnenu=30, the accuracy of the recommended result is optimal.

2. Conclusion of the experiment

The size of the nearest neighbor set of the commodity and the quality of the nearest neighbor greatly influence the evaluation value prediction of the system on unscored commodities, so that the quality of the nearest neighbor set of a user is influenced, and the recommendation accuracy of the system is reduced. Therefore, how to obtain the reliable nearest neighbor set of the commodity and the user is a key step for ensuring the quality of the recommendation algorithm.

The recommendation algorithm provided by the application can effectively relieve the data sparsity problem of the commodity scoring matrix of the user, improves the accuracy of the nearest neighbor set and the quality of the nearest neighbor object based on the improvement on the traditional similarity calculation method, improves the overall performance of the recommendation algorithm, and is more superior and applicable than other recommendation algorithms.

2. Recommendation method based on user adjacency set matrix decomposition model

The problem of cold start of a user and data sparsity of a matrix always influences the accuracy of a result of a recommendation algorithm, an access point is found from the two aspects, the accuracy of the result provided by the recommendation algorithm for a new registered user is improved based on an adjacency set model, the dimension of a user-commodity scoring matrix is reduced based on singular value matrix decomposition, the problem of the data sparsity of the matrix is solved, and the quality of a recommendation result is improved.

Because the recommendation algorithm of the collaborative filtering mode in the prior art calculates the similarity between users or commodities by collecting and analyzing historical electronic commerce behavior records of the users and certain purchase data of the commodities, and further obtains the nearest neighbor set of the users or the commodities through parameter adjustment, a large amount of experimental data also proves that the collaborative filtering mode has a good recommendation effect when the training data set is comprehensive and the data amount is sufficient, but for the users or the commodities which just join in an electronic commerce platform, various electronic commerce records are very rare, the similarity calculation method in the prior art is difficult to accurately calculate the similarity between target objects, and even becomes impossible under certain extreme conditions, the nearest neighbor set of the target objects is not easy to accurately construct, the accuracy of the recommendation result of the algorithm to new users is seriously influenced, the purchase rate of the new commodities is also reduced, in addition of a large amount of users or commodities to an electronic commerce website every day is realized, the number of commodity scoring matrixes of the users at moment is increased in an exponential scale along with the addition of the new objects, and the data sparsity problem is always serious and possibly exists. Therefore, today, which is so competitive in the e-commerce field, each platform needs to provide better recommendation service and first wins the favor of new users, so that the cold start problem and the data sparsity problem of the matrix have to be spent on building a more perfect recommendation system, and better recommendation service is provided for users with different situations.

Based on the above-mentioned current situation, the solution proposed in the present application is: the nearest neighbor set of the user is constructed through the user registration data or the archive information, the singular value decomposition is adopted to reduce the dimension of the matrix, the problem of inaccuracy of the recommendation result of the new user and the problem of data sparsity of the commodity scoring matrix of the user are effectively solved, and the overall performance of the recommendation algorithm is improved, which is also one of innovation points of the application.

Construction of an adjacency set model

When a user logs in an E-commerce platform, the system searches and constructs a nearest neighbor set according to the file information of the user, so that the collaborative filtering algorithm in the prior art is improved, and a calculation formula of the estimated score of the user u to the commodity i is given first:

b _ui ＝b _u +b _i +mu type 5

Wherein b _u The bias of the user object is represented, namely, the factor value of the user u which has no relation with the target commodity in the history evaluation habit, b represents the bias of the commodity object, namely, the factor value of the commodity which has no relation with the user in the obtained evaluation, mu represents the average of all evaluation values of a commodity object in the training set, and the commodity scoring matrix of the user is R _ui ，r _ui Representing the actual evaluation of commodity i by user uThe value of the product is calculated,representing an evaluation prediction value provided by the algorithm for the user U on the commodity i, wherein U= { U _l ,u ₂ ,…,u _n The user set consisting of n users is represented by i= { I ₁ ,i ₂ ，…,i _m And (c) represents a commodity set consisting of m commodities. />

w _uv And N (u, k) acquisition modes:

(1) Constructing a nearest neighbor set N (u, k): assuming that the feature vector F of each user is n-dimensional, and that each element of F corresponds to gender, age, occupation, academic, and income profile information of the user, it is expressed as: f= (gender, age, occupation, academic, income), and then digitally characterizing various information of the user, wherein the gender { male, female } is represented as {0,1}; age interval (0, 100)]With 1 representing the time interval (0, 14]The users within 2 indicate the age of (14, 19]The users within 3 represent the ages at (19, 22]The users within, 4, represent the ages at (22, 26 ]The users in the range divide the age interval according to the number of people, and so on; other types of user information are also expressed in a numerical expression mode, namely, the influence degree of different value ranges of each dimension on the score value difference is respectively obtained, namely, the absolute difference of different value users of the same dimension in the data set on the average score is analyzed, and a score value difference matrix D is formed _k (f _i ,f _j ) (e.g.: d (D) ₁ A difference matrix representing gender dimensions, f ₁ 、f ₂ Respectively representing two practical corresponding values of male and female; d (D) ₂ A difference matrix representing the dimension of age, f ₁ Indicating that the age is within the interval (0, 14]Users in f ₂ Indicating that the age is in the interval (14, 19]Users in, thus D ₂ (f ₃ ,f ₄ ) I.e. in the dimension of age, the years are in intervals (19, 22]The users and years in the interval (22, 26]Score difference of users in the database), and then obtaining a difference set D between every two users according to the file information of the users, and then obtaining the score difference of the users according to D _k In which the sum of the absolute average difference values caused by scoring is weighted by the age of s and the different value ranges in the academic dimensionObtaining a sim (u, v) value, and finally selecting k users with the largest similarity from a sim (u, v) value set to form a nearest neighbor set N (u, k) of a target user u, wherein the similarity is calculated as follows:

Wherein F is _i ，F′ _i Respectively representing different digital characterization values of the same dimension in the feature vector F corresponding to the users u and v, and belonging to a difference set D, namely that the dimension is different, if no difference exists, D _i And max (r) represents the upper score limit and n is the dimension of the user feature vector.

(2) Calculating the w of the neighbor _uv Value: calculating similarity between users by using file information of newly registered users, constructing nearest neighbor set of target object, calculating evaluation information of neighbor users, predicting evaluation value of target user, wasting important data resource of each historical behavior record of user if recommending algorithm singly uses user file information, judging basic property of user before predicting evaluation value of user, and obtaining parameter w by random gradient descent method if target user u has available historical behavior record _uv Is a value of (2); if the target user u is a new user and there is no history of behavior available, then the sim (u, v) value found in (1) is used as the weight w _uv Is a value of (2);

in the calculation process, the existing user behavior record is utilized to continuously adjust and obtain the weight value w with the minimum error _uv An objective function is defined:

wherein the meaning of each letter is defined, a factor in the formula And the over fitting condition of the training result is prevented.

b _u ←b _u +α(e _ui -λb _u )

b _i ←b _i +α(e _ui -λ _bi )

(II) user adjacency set matrix factorization customized recommendation

Decomposing a commodity scoring matrix R of a user into products of two simple matrices P, Q by singular value decomposition, wherein the expression is R=P ^T * Q, and adopting global offset to update increment iteration while decomposing, and extracting potential eigenvectors with the dimension f as predictions of scoring matrix missing values while reducing algorithm space complexity.

Decomposing a multidimensional matrix R into P ^T And Q is in the form of the product of two simple matrixes to simplify the complexity of data, and the calculation formula of a loss function L (u, i) adopting matrix decomposition is as follows:

wherein r is _ui Representing the evaluation value, p, of the commodity i by the user u in the scoring matrix R _u Feature vector, q, representing target user u in user set P _i Feature vector representing target commodity i in commodity set Q, factor λ (||p) _u ||| ² +||q _i || ² ) For preventing overfitting.

p _uk ←p _uk +α(e _ui *q _ik -λp _uk )

q _ik ←q _ik +α(e _ui *p _uk -λq _ik ) 10. The method of the invention

At the beginning of the training process, p _u And q _i Initializing, i.e. filling p with random values _u And q _i Is a f-dimensional feature vector of (2), and the evaluation value r of the target user u _ui Calculation ofThen, the error of the estimated value and the actual value is calculated>And updating each dimension of the feature vector, and obtaining the decomposed simple matrix P, Q after completing the iterative updating for a specific number of times.

the minimization error formula has the following deduction process:

the formula becomes after adding the overfitting prevention factor:

by random gradient descent method for w _uv 、p _u 、q _i 、b _u And b _i The parameters are iteratively solved, and the specific implementation steps are shown in fig. 7.

The training process for the training data set is shown in FIG. 7, and b for each user is obtained _u And p _u Value b of each commodity _i And q _i Value, and weight value w of nearest neighbor set user v to target user u _uv 。

Solving the number of iteration Count of the parameter fixed algorithm, regularization parameter lambda and learning rate alpha, the dimension f of the hidden characteristic vector and the number k of neighbors in the nearest neighbor set,

from the loop process of fig. 7, only the iteration complexity of the core part of the algorithm is considered, the time complexity of the algorithm is O (count×num (User) ×max (Num (I (u)), num (N (u, k)), f), and the actual number of scoring data in the scoring matrix is set as Num (R) _A ) The spatial complexity of the algorithm is then O (Num (R _A ))。

Algorithm experiment and result analysis

And verifying the rationality and superiority of the application by adopting an off-line experiment mode.

1. Experimental procedure and experimental results

Experiment one: the influence of different parameter values on the accuracy of a matrix decomposition model recommendation algorithm based on a user adjacency set is verified, and the experimental process mainly starts from two aspects:

(1) Controlling other variables, continuously adjusting the numerical value of the SVD hidden feature quantity f, and tracking and calculating the value of MAE;

(2) Controlling other variables, continuously adjusting the number k of the nearest neighbors in the nearest neighbor set, and tracking and calculating the value of MAE;

the experimental results shown in fig. 8 were obtained, and each MAE value in the graph was an average of 5 calculations, ensuring that the error was minimized.

Experiment II: comparing a recommendation algorithm based on a user adjacency set matrix decomposition model with other recommendation algorithms, controlling a parameter variable f to be a constant value 100, comparing the recommendation algorithm provided by the application with a recommendation algorithm based on single SVD and a recommendation algorithm based on Pearson correlation under the condition that the same data set and the training set and the test set are in the same proportion, keeping the value of a neighbor number k to change in intervals [10,50], tracking and calculating MAE values corresponding to the values, and the experimental result is shown in figure 9.

2. Analysis of experimental results

From the results of experiment one, it can be seen that: the hidden characteristic parameter f and the neighbor parameter k have great influence on the prediction accuracy of the recommendation algorithm, when the f value is fixed, the k value is increased, the MAE value is reduced, and the prediction accuracy of the recommendation algorithm is improved; when the k value is fixed, the f value is increased, the MAE value is reduced, and the prediction accuracy of the recommendation algorithm is improved. However, as the f-number increases, the space overhead and time consumption of the recommended algorithm increases.

From the results of experiment two, it can be seen that: when the f value of the parameter is fixed to be 100 and other factors except the k value of the parameter are consistent, MAE values based on Pearson, SVD and the three modes of the algorithm are reduced along with the increase of the k value, but under the condition that the values of the parameters are the same, the algorithm has smaller MAE values than other recommended algorithms, namely the recommended result is more accurate.

3. Conclusion of the experiment

The hidden characteristic parameter f of the algorithm can directly influence the matrix result after singular value decomposition, so that the accuracy of the prediction result of the recommendation algorithm is influenced; the neighbor parameter k can influence the result of the user adjacency set model, so that the prediction scoring accuracy of a recommendation algorithm is influenced; in the actual customized recommendation system design, the magnitude of each parameter value can be continuously optimized to achieve the best recommendation result.

The algorithm can better cope with the data sparsity problem of the scoring matrix, and compared with other algorithms, the algorithm can effectively relieve the recommendation problem caused by cold start, and has obvious advantages.

Overall analysis gave: the recommendation algorithm based on the user adjacency set matrix decomposition model is reasonable and practical, and has obvious superiority compared with other algorithms.

Claims

1. The electronic commerce customization recommendation method for big data cloud entropy mining is characterized by fusing two improvement strategies: adopting a cloud entropy mining method to calculate the similarity of each object so as to obtain a more accurate nearest neighbor set; adopting a matrix decomposition model based on an adjacency set to solve the problems of cold start and data sparsity;

2. The e-commerce customization recommendation method for big data cloud entropy mining of claim 1, wherein the improved similarity calculation method comprises: judging the preference degree of the user on the commodity according to the evaluation value of the user on the commodity, if the electric commodity gives the user five levels of very offensive, general, favorite and very favorite to the evaluation level of the commodity, the corresponding evaluation values are 1, 2, 3, 4 and 5, and c is set ₁ 、c ₂ 、c ₃ 、c ₄ 、c ₅ Respectively represent the evaluation times of the five stages, U _x A vector indicating the number of evaluations;

3. The method for customizing recommendation of electronic commerce by big data cloud entropy mining according to claim 1, wherein the scoring of the commodity by the user is predicted: firstly, solving the similarity of a target object by adopting cloud similarity, then predicting the scoring value of a user for the target commodity according to the evaluating value of the user for the similar commodity, increasing the number of commodities purchased and evaluated by different users together, and avoiding the problem caused by matrix sparsity;

Recording I as the evaluation times vector of the target object, c _x The method comprises the steps of representing the evaluation times of a user set on the commodity x level, classifying commodity evaluation levels into 1 to 5 five levels, reversely mining on the basis of reverse cloud entropy, obtaining three digital characteristic values of cloud entropy mining through an evaluation time vector of a target commodity, forming characteristic vectors of commodity evaluation, and marking as F= (Ex, en, he), wherein Ex is an expected value of cloud entropy mining and represents average preference of the target user on all commodities; en is entropy of cloud entropy mining, and represents the concentration of the user on commodity evaluation; he is super entropy mined by cloud entropy, represents the stability of En, calculates the similarity of commodities based on a cloud similarity calculation method according to the evaluation feature vector of the commodities, and predicts the scoring value of the commodity which is not yet evaluated by a target user.

4. The method for customizing recommendation of electronic commerce by big data cloud entropy mining according to claim 3, wherein the specific steps of predicting the score of the commodity by the user are as follows:

the first step: inputting scoring matrix R of user commodity _m×n ；

fourth step: find out the sim (i, j) valueThe first k commodities form the nearest neighbor set N of the target commodities _i ＝{i ₁ ,i ₂ ，…，i _k N, where N _i Commodity i itself, not containing sim (i, j) value 1, and set N _i The element value of (2) decreases as the subscript increases;

5. The method for customizing e-commerce recommendation for big data cloud entropy mining according to claim 1, wherein a recommendation result set is generated: the cloud similarity calculation method is adopted to calculate the similarity of the objects, the improvement is carried out on the basis of the collaborative filtering algorithm, and an improved customized recommendation algorithm is provided: the similarity calculation method based on cloud entropy mining calculates the similarity of users, a nearest neighbor set of the users is constructed, the evaluation value of the users on the target commodity in the nearest neighbor set is obtained, the evaluation predicted value of the target user on the commodity is obtained through weighting, finally, the predicted values are ordered, and a plurality of commodities with the largest predicted value are used as a part of the recommendation result and added into a recommendation list for the users to select.

6. The electronic commerce customization recommendation method based on big data cloud entropy mining according to claim 5, wherein the basic flow of the recommendation algorithm based on cloud entropy mining and user prediction scoring is as follows: firstly, collecting scoring data of a user, and processing the data to obtain a commodity scoring matrix R of the user; then calculating the evaluation condition in R to obtain the grading times vector of each commodity; then, calculating the evaluation feature vector of each commodity by a calculation method of reverse cloud entropy mining digital features; combining a cloud similarity calculation method to obtain the similarity of the target commodity and other commodities, and sequencing to obtain a nearest neighbor set; predicting the grading value of the target commodity according to the evaluation condition of the neighboring commodity in the nearest neighbor set, and filling a user-commodity grading matrix; then calculating the grading condition of the target user according to the complete grading matrix, and calculating to obtain the evaluation feature vector of the user; calculating the similarity between users by the feature vector to obtain the nearest neighbor set of the users; and finally, calculating the estimated scoring value of the target user according to the scoring condition of the neighbor users in the nearest neighbor set, and presenting a plurality of commodities with higher predicted values to the user as recommendation results, thereby completing the whole recommendation process.

7. The method for customizing recommendation of electronic commerce by big data cloud entropy mining according to claim 1, wherein an adjacency set model is constructed: when a user logs in an E-commerce platform, the system searches and constructs a nearest neighbor set according to the file information of the user, so that the collaborative filtering algorithm in the prior art is improved, and a calculation formula of the estimated score of the user u to the commodity i is given first:

wherein b _ui Representing global bias of user u, N (u, k) representing nearest neighbor set composed of k neighbors of user u, H (i) representing set composed of all users who purchased commodity i, v representing any user, w _uv A weight parameter representing the user v for the target user u, r _vi Representing the existing evaluation value of the corresponding commodity i by the neighbor user v;

b _ui ＝b _u +b _i +mu type 5

Wherein b _u The bias amount of the user object, namely, the factor value of the user u which has no relation with the target commodity in the history evaluation habit, and b represents the commodity objectI.e. factor value of i commodity in the obtained evaluation and no relation with user, mu represents average of all evaluation values of a commodity object in training set, and user-commodity scoring matrix is R _ui ，r _ui The actual evaluation value of the commodity i by the user u is represented,representing an evaluation prediction value provided by the algorithm for the user U on the commodity i, wherein U= { U _l ,u ₂ ,…,u _n The user set consisting of n users is represented by i= { I ₁ ,i ₂ ，…,i _m And (c) represents a commodity set consisting of m commodities.

8. The e-commerce customization recommendation method for big data cloud entropy mining of claim 7, wherein w _uv And N (u, k) acquisition modes:

(1) Constructing a nearest neighbor set N (u, k): assuming that the feature vector F of each user is n-dimensional, and that each element of F corresponds to gender, age, occupation, academic, and income profile information of the user, it is expressed as: f= (gender, age, occupation, academic, income), and then digitally characterizing various information of the user, wherein the gender { male, female } is represented as {0,1}; age interval (0, 100)]With 1 representing the time interval (0, 14]The users within 2 indicate the age of (14, 19]The users within 3 represent the ages at (19, 22]The users within, 4, represent the ages at (22, 26]The users in the range divide the age interval according to the number of people, and so on; other types of user information are also expressed in a numerical expression mode, namely, the influence degree of different value ranges of each dimension on the score value difference is respectively obtained, namely, the absolute difference of different value users of the same dimension in the data set on the average score is analyzed, and a score value difference matrix D is formed _k (f _i ,f _j ) The natural stone obtains the difference set D between every two users according to the file information of the users, and then according to D _k The absolute average difference value caused by scoring is weighted and summed by different value ranges of the age of s and the academic dimension to obtain the value of sim (u, v), and k phases are finally selected from a sim (u, v) value setThe user with the highest similarity forms the nearest neighbor set N (u, k) of the target user u, and the similarity is calculated as follows:

solving partial derivatives of variables bu, bi and wuv in the formula, then iteratively solving the minimum value of the formula 7 by a random gradient descent method, and solving the optimal parameter value by an iterative optimization algorithm, wherein the optimal parameter value is shown in the formula 8:

b _u ←b _u +α(e _ui -λb _u )

b _i ←b _i +α(e _ui -λb _i )

9. The method for e-commerce customization recommendation for big data cloud entropy mining of claim 1, wherein the user adjacency set matrix factorization customization recommendation:

wherein r is _ui Representing the evaluation value, p, of the commodity i by the user u in the scoring matrix R _u Feature vector, q, representing target user u in user set P _i Feature vector representing target commodity i in commodity set Q, factor λ (||p) _u || ² +||q _i || ² ) For preventing overfitting;

p _uk ←p _uk +α(e _ui *q _ik -λp _uk )

q _ik ←q _ik +α(e _ui *p _uk -λq _ik ) 10. The method of the invention

the minimization error formula has the following deduction process:

the formula becomes after adding the overfitting prevention factor: