CN111475744A

CN111475744A - Personalized position recommendation method based on ensemble learning

Info

Publication number: CN111475744A
Application number: CN202010257793.0A
Authority: CN
Inventors: 朱俊; 韩立新; 勾智楠; 杨忆; 袁晓峰; 李树; 李景仙
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2020-04-03
Filing date: 2020-04-03
Publication date: 2020-07-31
Anticipated expiration: 2040-04-03
Also published as: CN111475744B

Abstract

The invention discloses a personalized position recommendation method based on ensemble learning, which comprises the following steps: firstly, converting a check-in data set into a scoring matrix; and secondly, selecting a plurality of recommendation sub-algorithms, and dividing the address accessed by the active user into a training sub-data set and an evaluation sub-data set. Utilizing the training sub data set, and calculating pre-scores for the addresses in the evaluation data set and the addresses which are not visited by each sub algorithm; thirdly, calculating the recommendation precision F1 of each sub-model by using the evaluation data set to generate a precision weight value set; selecting information gain IG as a stability index, evaluating the stability of each sub-model, and calculating a stability weight value set; and fifthly, calculating a final total weighting coefficient for the active users. The integration model fuses pre-scores of the inaccessible addresses by each sub-algorithm according to a total weighting coefficient to generate a final prediction score; and sixthly, evaluating the comprehensive performance of the method and each sub-algorithm before integration, and evaluating the effectiveness of the method.

Description

Personalized position recommendation method based on ensemble learning

Technical Field

The invention relates to an integrated learning-based personalized position recommendation method in a social network, and belongs to the technical field of artificial intelligence and machine learning.

Background

In L BSNs, complex Social relationships, such as friend relationships, coworker relationships, and relative relationships, can be established among users, users can also view places of interest (POIs), such as restaurants, shops, movie theaters, and the like, by using geographic information added in the Social Network, and check in by using the mobile device when visiting the points of interest, and publish the geographic location information of the users, and the suggestions of the users and the bs L BSNs help businesses further learn about the real services behind the Network, so that the real services meet the requirements of the users.

As the number of users registered in L BSNs is increasing, L BSNs store and accumulate abundant available information, and the abundant information enables the users not to quickly and effectively find information needed by the users within a limited time, therefore, a recommendation system which aims at solving the problem of information overload is concerned by more researchers, for example, a famous Amazon company uses the recommendation system to recommend commodities to the users, click rate and sales volume are improved for merchants, a movie recommendation website Netflix attracts a plurality of research teams to aim at improving recommendation accuracy by holding the recommendation system for a big race.

According to design strategies, the recommendation algorithm mainly comprises a collaborative filtering algorithm, a content-based recommendation algorithm and a hybrid recommendation algorithm, wherein the collaborative filtering algorithm comprises a memory-based collaborative filtering algorithm (such as user-based collaborative filtering (UBCF), project-based collaborative filtering (IBCF)) and a model-based collaborative filtering algorithm (such as Singular Value Decomposition (SVD), clustering model, probabilistic latent semantic analysis (P L SA)), in content-based location recommendation, a number of characteristics such as labels, classifications and user comments can be extracted from a location.

The above conventional recommendation techniques neither take into account the geographic characteristic impact of the location nor take advantage of social relationships between users. However, each location in the location recommendation system has geographic features identified by latitude and longitude, and the geographic features of the POIs can have a significant impact on the user's access preferences. In addition, the social relationship of the user also affects the check-in behavior of the user, and when the user does not determine the place where the user wants to go, the user often refers to the historical access records of friends in the social network. Therefore, when designing a location recommendation algorithm, it is necessary to take the factors in the aspect of the situation into consideration, mine the geographic features of the location, and utilize the social relationship between users.

Social-based Collaborative Filtering (SCF) is a recommended method that considers both the personal preferences and Social relationships of users, and is based on the assumption that friends all have the same interest preferences and are easily influenced by each other, and active users are more willing to make a decision for themselves through the experience of friends. In SCF, only the preferences of friends of the active user need to be considered when calculating the predictive rating of an address. In calculating the similarity between an active user and his friends, the history scores of both visited places can be used, the geographic distance between the user's residence can be used, or the similarity of the intersection and check-in history of their friendship networks can be considered. In addition, some research is dedicated to mining the geographic features of the location, and some techniques use matrix factorization, and more algorithms simulate the geographic influence through a common probability distribution, such as power law distribution, multi-center gaussian distribution, Kernel Density Estimation (KDE), and so on. When the KDE is used for predicting the probability of the user accessing the new position, the influence of the geographical position on the check-in activity of each user is personalized, and a more expressive geographical perception recommendation system is constructed.

However, most of the current position recommendation technologies are single algorithm models, and all the models are based on certain theoretical assumptions, so that each algorithm has inherent defects and can only play excellent roles in a specific application scenario. For example, content-based location recommendation is suitable for dealing with cold start problems, but it requires a large amount of structural information for users and locations, which increases the storage and computation costs of the system; the UBCF and IBCF algorithms only consider the neighborhood effect in the rating data, so that although the user preference is mined, the characteristics of the item content are ignored, and the diversity of the recommendation result is limited; the SVD algorithm has high computational complexity and low running speed, and the recommendation accuracy is still to be improved. To overcome the limitations of a single algorithm, some researchers have focused on how to combine a small number of several scoring prediction methods into a single overall model. Ensemble learning is just an effective means to solve this problem. Ensemble learning is a new machine learning paradigm that can effectively improve the generalization of learning systems by using multiple weak learners to solve the same problem. The authoritative Dietterich in the international field of machine learning has pointed out that ensemble learning is the first of four major research directions for machine learning (ensemble learning, symbolic learning, statistical learning, and reinforcement learning). Ensemble learning can exceed a single learning algorithm in several respects: the method has better average performance in different fields and data sets; a combined solution which cannot be obtained by any single learning algorithm can be found; the variation of the sampling is less sensitive to noise and outliers; solutions can be obtained by combining multiple distributed data sources or multiple characteristics of data sources, and the resultant fusion of multiple data sources or multiple characteristics of data sources is becoming increasingly important in distributed data mining. The effectiveness of the ensemble learning technology enables the ensemble learning technology to be widely applied to a plurality of fields such as biological feature recognition, computer-aided medical diagnosis, text recognition, Web information filtering and the like.

At present, some recommendation systems apply ensemble learning to personalized recommendation problems, and effectiveness and adaptability of information recommendation are improved. However, relevant research proves that the existing recommendation system based on ensemble learning still has many defects and shortcomings, which summarize the following points:

(1) in the fusion process, the number of considered sub-algorithms is fixed, the types of the considered sub-algorithms are limited, and the expandability of the integrated model is not strong. At present, most popular position recommendation systems based on ensemble learning only consider the fusion of two algorithms, and sub-algorithms are generally a certain collaborative filtering algorithm or position access probability estimation, so that the improvement range of the application scene and the system performance is limited to a certain extent. The existing integration framework cannot support the fusion of any number and any kind of recommendation sub-algorithms.

(2) The integration algorithm needs to set some weighting coefficients to fuse the prediction results of each sub-algorithm into a final prediction score, and the weighting coefficients represent the importance of each sub-algorithm. The fusion rule of existing algorithms is usually addition or multiplication or other simple linear combination, whose weighting coefficients are consistent for all users. However, in the real world, since the characteristics of each user and each item are different, the optimal sub-algorithms are not consistent for different users, that is, the algorithms most capable of mining and reflecting the user interests in the sub-algorithms are different from person to person. It follows that it is necessary to tailor a set of weighting coefficients for each user, ensuring that the integration algorithm can "bias" different sub-algorithms for different users by way of personalized weighting.

(3) In order to enhance the user experience, a good recommendation system should have the feature of robustness. The robustness of the recommendation system contains two indispensable factors of accuracy and stability. However, most of the current research is focused on only one of these aspects. In fact, the accuracy of the prediction determines whether the user likes the recommended location, and the stability of the system reflects whether the recommendation system can produce consistent recommendations in various application scenarios. Ignoring any of these aspects can affect the user's stickiness and reduce the profits of the service provider.

(4) At present, few stability studies almost limit application scenarios to malicious attacks, for example, an attacker tries to recommend a preset item to a user. However, in addition to malicious attacks, the inconsistency of the recommendation results may also be caused by uncertainties due to data source limitations (such as sparsity and cold start), different data preprocessing modes and model training, and the stability of the system is affected. But the system stability research under the non-malicious attack scene is almost blank.

The above-mentioned disadvantages of the existing recommendation system technology based on ensemble learning bring about major disadvantages in the design, development, deployment and operation of different e-commerce platforms, and especially cause the service quality of the recommendation system to be reduced on the network platform of massive project information, thereby affecting the sales performance of the e-commerce system.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides an integrated learning-based personalized position recommendation method aiming at constructing a position recommendation system with strong expandability, high recommendation precision and stable recommendation result, and systematically provides a technical flow scheme of an integrated recommendation algorithm. Meanwhile, the system theory is taken as a theoretical basis, the robustness evaluation system is taken as a necessary component of the recommendation system, the accuracy of the recommendation result is considered, the diversity characteristics of data utilization and user behaviors under non-malicious attack are also considered, an evaluation mode using information gain as a system stability index is innovatively provided, the uncertainty caused by data source limitation (such as sparsity and cold start), different data preprocessing modes and model training is quantized, and the stability of the output result of the recommendation system is improved. In addition, in the integrated model, personalized weighting is carried out on each sub-model, an integrated recommendation algorithm which best meets the interest characteristics of the user is customized for the user, and the service quality of the recommendation system is further enhanced.

The technical scheme adopted by the invention for solving the technical problems is as follows: the invention divides the address accessed by the active user into a training subdata set and an evaluation subdata set according to a certain proportion. Selecting a plurality of recommendation sub-algorithms of any type, and calculating pre-scores of other addresses for the active users by using historical score information in the active user training sub-data sets by each sub-algorithm. And comparing and evaluating historical scores and pre-score information of the addresses in the sub-data sets, carrying out accuracy evaluation and stability evaluation on each sub-algorithm, and generating personalized weighting coefficients for active users according to evaluation results. Combining the pre-scores of the sub-algorithms on the inaccessible addresses by using weighting coefficients to generate a final prediction score of the integration model on the inaccessible addresses of the active users, sorting the prediction scores of all the inaccessible addresses, and selecting a plurality of addresses ranked at the top to recommend to the active users (as shown in fig. 1).

The specific process of the method comprises the following steps:

step 1, collecting and sorting an original user sign-in data set C, and converting the original user sign-in data set C into a user-position scoring matrix R.

Step 2, selecting an active user u in the location-based social network L BSN_aAs a recommended service object. Selecting any type and any number of recommendation sub-algorithms A₁,A₂,…,A_n. And dividing the addresses accessed by the active users into a training subdata set and an evaluation subdata set according to a certain proportion. And calculating pre-scores for the addresses which are not accessed by the active user and the addresses in the evaluation sub-data sets by using the accessed information of the active user training sub-data sets.

And 3, selecting the evaluation index F1 of the recommendation accuracy as a recommendation accuracy evaluation index, and comparing the real score and the pre-score information of the address in the evaluation sub-data set to evaluate the recommendation accuracy of each sub-model. According to the recommendation precision index F1 value of each recommendation submodel, the active user u is selected_aComputing a set of precision weight values W_a。

And 4, selecting the information gain IG as a system stability evaluation index, comparing the real score and the pre-score information of the address in the evaluation sub data set, and evaluating the system stability of each sub model in a non-malicious attack scene. According to the information gain IG value of each recommended sub-model, the active user u is selected_aComputing a set of stability weight values G_a。

Step 5, comprehensively considering the robustness of the integrated recommendation system, balancing the relationship between the recommendation precision and the system stability, and providing the active users u on the basis of two groups of weighting coefficients_aCalculating the final total weighting coefficient C_a. Pre-scoring the inaccessible addresses by each recommended submodel according to a total weighting coefficient C_aAnd fusing to generate the final prediction scores of the integration model on the unaccessed addresses. And sorting all the inaccessible addresses according to the final prediction scores, and providing a recommendation list consisting of a plurality of addresses which are ranked at the top for the active users.

And 6, evaluating the robustness of each recommendation system by using the precision index and the stability index, mainly comparing the comprehensive performance of the personalized position recommendation algorithm based on the ensemble learning and each sub-algorithm before the ensemble, and evaluating the applicability and the effectiveness of the proposed technology.

Has the advantages that:

1. the invention has strong expandability and supports the fusion of any type and any number of recommendation sub-algorithms. In practical application, the method and the system can select a proper recommendation sub-algorithm according to different application scenes and different data characteristics, obtain higher recommendation quality on the basis of any one existing algorithm, improve the user stickiness in the location-based social network, and help merchants accurately push advertisements for the users, so that more potential consumers are attracted.

2. According to the invention, a group of weight coefficients are customized for each active user by analyzing different behavior characteristics of each user, and the integrated algorithm can be ensured to be the sub-algorithm which can most mine the interest of different users according to the 'bias' of the different users in a personalized weighting mode. The integration mode of 'customized according to different persons' greatly improves the use satisfaction degree of users to the social network platform, is also beneficial to solving other machine learning problems, and has very important significance to practical application.

3. In the fusion process, the recommendation precision index F1 value integrating the accuracy and the recall rate is selected as the evaluation index of the recommendation accuracy, so that the integrated model is better than each sub-model in the recommendation accuracy, the preference degree of the recommendation result to the user is ensured, and the aim of improving the prediction accuracy of the recommendation algorithm is fulfilled.

4. The method innovatively uses the information gain as the evaluation index of the stability of the recommendation system, fully considers the uncertainty caused by data source limitation (such as sparsity and cold start), different data preprocessing modes and model training, can measure a plurality of factors causing the instability of the system, and ensures the system stability of the recommendation system in a non-malicious attack scene.

5. The method comprehensively considers the prediction accuracy and the system stability, and robustly improves the service quality of the recommendation system. The method has certain universality and portability, can be applied to a position recommendation system, is also suitable for the personalized recommendation field of other traditional projects, and has wide industrial application prospect.

6. The method and the device aim at constructing the position recommendation system with strong expandability, high recommendation precision and stable recommendation result, take the diversity characteristics of data utilization and user behaviors into consideration while considering the accuracy of the recommendation result, innovatively propose an evaluation mode using information gain as a system stability index, quantify uncertainties caused by data source limitation (such as sparsity and cold start), different data preprocessing modes and model training, and well improve the stability of the output result of the recommendation system.

Drawings

Fig. 1 is a flowchart of a personalized position recommendation method based on ensemble learning according to the present invention.

Fig. 2 is a flowchart of specific steps of the personalized position recommendation method based on ensemble learning according to the present invention.

FIG. 3 is a flow chart of the steps of the present invention for converting raw user check-in records to a user-location scoring matrix.

FIG. 4 is a frequency histogram of recommendation accuracy indicators F1 on the evaluation sub data set after each recommendation sub-algorithm has been run 100 times (each time a group of target users is randomly selected) in an embodiment of the present invention.

Fig. 5 is a frequency histogram of each recommended sub-algorithm after 100 runs (randomly selecting a group of target users each time) in an embodiment of the present invention based on evaluating the information gain IG on the sub-data set.

FIG. 6 is a box plot of the accuracy of the integrated model after 100 runs in an embodiment of the present invention.

FIG. 7 is a box plot of the recall after 100 runs of the integrated model in an embodiment of the present invention.

FIG. 8 is a box diagram of the recommended accuracy index F1 after 100 runs of the integration model in an embodiment of the invention.

FIG. 9 is a histogram comparing the integration model with the recommended accuracy index F1 for each sub-model in the embodiment of the present invention.

FIG. 10 is a histogram comparing the integrated model with the information gain IG of each sub-model in an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and specific examples.

The specific flow of the design and implementation of the invention is shown in fig. 2, and the main variables and parameters in the process are shown in table 1.

TABLE 1 Functions of the principal variables and parameters

Firstly, collecting and sorting an original user check-in data set C, and converting the original user check-in data set C into a user-position scoring matrix R, wherein the specific flow is shown in FIG. 3, and the operation steps are as follows:

and (1.a) selecting a user check-in data set C of the target recommendation system, wherein the data set is composed of historical check-in records of L addresses of U users, and information such as user ID, address ID, access time, address longitude, address latitude and the like is extracted from each check-in record.

(1.b) converting each check-in record into a triplet (u)_i,l_j,n_ij) Wherein u is_iIs the ith user (1 ≦ i ≦ U), l_jIs the jth item (1. ltoreq. j. ltoreq. L), n_ijRepresenting user u_iAccess address l_jThe number of times.

(1.c) calculating the location l of all users_jTotal number of accesses NC _ au_j。

(1.d) calculating user u_iTotal number of visited locations N L C_i。

(1.e) calculating user u_iTotal number of accesses to all locations NC _ al_i。

(1.f) calculating the visited location l_jAll users NUC of_j。

(1.g) user u_iAt address l_jNumber of check-ins n_ijConversion to user u_iFor address l_jScore r of_ijThe specific method comprises the following steps:

wherein r is_ijRepresenting user u_iFor address l_jScore of n, n_ijRepresenting user u_iAt address l_jNumber of check-ins, NC _ au_jIndicating all users are at location l_jL denotes the total number of addresses, N L C_iRepresenting user u_iTotal number of accessed positions, NC _ al_iRepresenting user u_iTotal number of visits to all locations, U representing total number of users, NUC_jIndicating visited location l_jOf all users.

(1, h) carrying out normalization operation on the user scores, wherein the specific calculation method comprises the following steps:

wherein r is_ijRepresenting user u_iFor address l_jMin represents the lowest value of all scores in the user-location score matrix R, and max represents the highest value of all scores. Through normalization operationAfter, user u_iFor address l_jScore r of_ijIs mapped to [0, 1 ]]In the interval, the value 1 indicates that the user visits the position frequently and likes the position very much; a value of 0 indicates that the user never visited the location, and a higher score value indicates that the user prefers the address.

Summing all scores to form a user-location score matrix R ═ R_ij},i∈[1,U],j∈[1,L]Where i denotes a user number, j denotes an address number, U denotes a total number of users, L denotes a total number of addresses, r_ijRepresenting user u_iFor address l_jThe score of (1).

Second, choose L active user u in BSN_aAs a recommended service object. Selecting any type and any number of recommendation sub-algorithms A₁,A₂,…,A_nAnd n is the number of the recommended sub-algorithms. And dividing the addresses accessed by the active users into a training subdata set and an evaluation subdata set according to a certain proportion. And calculating pre-scores for the addresses which are not accessed by the active user and the addresses in the evaluation sub-data sets by using the accessed information of the active user training sub-data sets. The operation steps are as follows:

(2.a) obtaining a certain active user u currently served by the recommendation system_aThe information of (1).

(2.b) selecting a set of recommendation algorithms A according to the application scene and the data characteristics₁,A₂,…,A_n(n is the number of recommended sub-algorithms) as the sub-algorithms of the integration model in the invention, for example, a collaborative filtering algorithm (UBCF) based on a user, a collaborative filtering algorithm (IBCF) based on a project, a collaborative filtering (SCF) based on socialization, a Kernel Density Estimation (KDE), a Singular Value Decomposition (SVD), other existing integration algorithms and the like can be selected.

(2, c) carrying out model training on each algorithm according to the operation mechanism of each recommended sub-algorithm to obtain each recommended sub-model M₁,M₂,…,M_n(n is the number of recommended sub-algorithms).

(2, d) setting a uniform address division ratio p for all active users, and dividing the active users u_aThe accessed address is divided according to the proportionDivided into training Sub data sets Sub1_aAnd evaluating the Sub data sets Sub2_a。

(2.e) Using recommendation submodels M₁,M₂,…,M_n(n is the number of recommendation sub-algorithms) and active users u_aSub1 of the training Sub data set_aFor the set of unaccessed addresses New L_aAnd evaluating the Sub data sets Sub2_aAddress l in_k(l_k∈NewL_a∪Sub2_a) Calculate a pre-score, as

Thirdly, selecting an evaluation index F1 of recommendation precision as a recommendation accuracy evaluation index, and comparing and evaluating the Sub data sets Sub2_aAnd evaluating the recommendation accuracy of each submodel according to the real scoring and pre-scoring information of the address. According to the recommendation precision index F1 value of each recommendation submodel, the active user u is selected_aComputing a set of precision weight values W_aThe method comprises the following implementation steps:

(3.a) using A as each recommendation sub-algorithm in the second step_xAnd (x is more than or equal to 1 and less than or equal to n) represents that n is the number of the recommended sub-algorithms. Collecting each sub-algorithm as active user u_aThe computed pre-scoring information will evaluate the Sub data sets Sub2_aAll addresses in the training list are sorted according to pre-scores, and the address of M before the ranking is taken to generate a training list TopM_axAnd (4) collecting.

(3.b) collecting active users u_aSub2 for evaluating Sub data sets_aWill evaluate the Sub data set Sub2_aPutting addresses with middle real scores larger than goodling into a set preference sub data set Prefer_a。

(3.c) calculating each recommendation sub-algorithm A_xThe accuracy Precision of Precision is calculated by the following specific method:

wherein u is_aRepresenting active users currently enjoying the recommended service, A_x(x is more than or equal to 1 and less than or equal to n) represents a certain recommendation sub-algorithm (n is the number of the recommendation sub-algorithms), and a training list TopM_axRepresenting the evaluation of the Sub data set Sub2_aThe set of the top M addresses with the highest pre-score, the preference sub data set Prefer_aRepresenting the evaluation of the Sub data set Sub2_aThe middle real score is larger than the address set of goodling, M represents the training list TopM_axNumber of addresses in the set.

(3.d) calculating recommendation sub-algorithms A_xThe Recall rate Recall comprises the following specific calculation method:

(3.e) calculating recommendation sub-algorithms A_xThe specific calculation method of the comprehensive accuracy index F1 is as follows:

wherein u is_aRepresenting active users currently enjoying the recommended service, A_x(x is more than or equal to 1 and less than or equal to n) represents a certain recommendation sub-algorithm (n is the number of recommendation sub-algorithms), and M represents a training list TopM_axNumber of addresses in the set, precision (u)_a,A_xM) represents each recommendation sub-algorithm A_x(x is not less than 1 and not more than n) accuracy, called (u)_a,A_xM) represents each recommendation sub-algorithm A_x(1. ltoreq. x. ltoreq.n).

(3.f) calculation as active user u_aRecommendation sub-algorithms A during recommendation_xPrecision weight ofValue W_ax(x is more than or equal to 1 and less than or equal to n), wherein n is the number of the recommended sub-algorithms, and the specific calculation method comprises the following steps:

wherein u is_aRepresenting active users currently enjoying the recommended service, A_x(x is more than or equal to 1 and less than or equal to n) represents a certain recommendation sub-algorithm (n is the number of recommendation sub-algorithms), and M represents a training list TopM_axNumber of addresses in set, F1 (u)_a,A_xM) represents each recommendation sub-algorithm A_x(x is more than or equal to 1 and less than or equal to n) is used.

And fourthly, selecting an information gain IG as a system stability evaluation index, comparing the real score and the pre-score information of the address in the evaluation sub data set, and evaluating the system stability of each sub model in a non-malicious attack scene. According to the information gain IG value of each recommended sub-model, the active user u is selected_aComputing a set of stability weight values G_aThe method comprises the following implementation steps:

(4.a) compute evaluation Sub data set Sub2_aThe specific calculation method of the information entropy in (1) is as follows:

wherein u is_aRepresenting active users currently enjoying the recommendation service, Sub2_aIndicates that user u is to be active_aThe accessed addresses are divided according to a certain proportion to obtain an evaluation subdata set, and preference is given to the subdata set Prefer_aRepresenting the evaluation of the Sub data set Sub2_aThe medium true score is larger than the address set of goodling.

(4.b) using A as each recommendation sub-algorithm in the second step_xAnd (x is more than or equal to 1 and less than or equal to n) represents that n is the number of the recommended sub-algorithms. The computation will evaluate the Sub data set Sub2_aAddress in (1) according to sub-algorithm A_xThe conditional entropy of the classification (classified into recommendation and non-recommendation) of the recommendation result is calculated by the following specific calculation method:

wherein u is_aRepresenting active users currently enjoying the recommended service, A_x(1 ≦ x ≦ n) represents a recommendation Sub-algorithm (n is the number of recommendation Sub-algorithms), Sub2_aIndicates that user u is to be active_aThe accessed addresses are divided according to a certain proportion to obtain an evaluation subdata set and a training list TopM_axRepresenting the evaluation of the Sub data set Sub2_aThe set of the first M addresses with the highest pre-score, M represents the training list TopM_axNumber of addresses in the set, TP_axIs the number of addresses really liked by the user in the recommendation list, FN_axThe number of addresses really liked by the user who is not in the recommendation list (not recommended).

(4.c) compute evaluation Sub data set Sub2_aAddress in (1) according to sub-algorithm A_xThe specific calculation method of the information gain after classification of the recommendation result is as follows:

IG(u_a,A_x,M)＝D(u_a)-T(u_a,A_x,M) (9)

wherein u is_aRepresenting active users currently enjoying the recommended service, A_x(x is more than or equal to 1 and less than or equal to n) represents a certain recommendation sub-algorithm (n is the number of recommendation sub-algorithms), and M represents a training list TopM_axNumber of addresses in the set, D (u)_a) Representing the evaluation of the Sub data set Sub2_aEntropy of (1), T (u)_a,A_xM) indicates that the Sub data set Sub2 is to be evaluated_aAddress in (1) according to sub-algorithm A_xThe conditional entropy of the classification (classified into recommendation and non-recommendation) of the recommendation result of (1).

(4.d) calculation as active user u_aRecommendation sub-algorithms A during recommendation_xStability weighted value G of_ax(x is more than or equal to 1 and less than or equal to n), wherein n is the number of the recommended sub-algorithms, and the specific calculation method comprises the following steps:

wherein u is_aRepresenting active users currently enjoying the recommended service, A_x(1. ltoreq. x. ltoreq.n) representsA certain recommendation sub-algorithm (n is the number of the recommendation sub-algorithms), and M represents a training list TopM_axNumber of addresses in the set, IG (u)_a,A_xM) represents each recommendation sub-algorithm A_x(x is more than or equal to 1 and less than or equal to n).

Fifthly, comprehensively considering the robustness of the integrated recommendation system, balancing the relationship between the recommendation precision and the system stability, and providing active users u on the basis of two groups of weighting coefficients_aCalculating the final total weighting coefficient C_a. Pre-scoring the inaccessible addresses by each recommended submodel according to a total weighting coefficient C_aAnd fusing to generate the final prediction scores of the integration model on the unaccessed addresses. And sorting all the inaccessible addresses according to the final prediction scores, and providing a recommendation list consisting of a plurality of addresses which are ranked at the top for the active users. The specific implementation steps are as follows:

(5.a) calculation as active user u_aRecommendation sub-algorithms A during recommendation_x(1. ltoreq. x. ltoreq.n) final weighting factor C_axAnd n is the number of the recommended sub-algorithms, and the specific calculation method comprises the following steps:

wherein, C_axPresentation recommendation sub-algorithm A_xPre-scored final weight value, W_axDenoted as active user u_aWhen recommending, the recommendation sub-algorithm A_xPre-scored precision weight value, G_axPresentation recommendation sub-algorithm A_xPre-scored stability weight values.

(5.b) for active user u_aLocation i not visited_k(l_k∈NewL_a) And calculating final prediction scores for the positions by an integrated algorithm, wherein the specific calculation method comprises the following steps:

wherein, C_axIs an active user u_aThe final weighting factor at the time of recommendation,

is the recommendation sub-algorithm A_xFor active user u_aLocation i not visited_kPre-scoring of (2).

(5, c) sorting all addresses which are not visited by the active user according to the final prediction score of the integration algorithm, forming a recommendation list by N positions which are ranked at the top, and enabling the recommendation list to be TopN L ist_aAnd returning to the active user.

And sixthly, evaluating the robustness of each recommendation system by using the precision index and the stability index, mainly comparing the comprehensive performance of the personalized position recommendation algorithm based on the ensemble learning and each sub-algorithm before the ensemble, and evaluating the applicability and the effectiveness of the proposed technology. The method comprises the following implementation steps:

and (6.a) randomly selecting U × 10% of users from the target data set as an active user set AU, and operating each recommendation algorithm for each active user in the set to generate a recommendation list.

And (6.b) evaluating the robustness of each recommendation system by using the Precision index and the stability index, wherein the values of Precision, Recall, recommendation Precision index F1 and information gain IG of each algorithm which runs for the active user set AU once are the average value of the indexes of all users in the AU set.

(6.c) repeating steps (6.a) and (6.b) times Ntimes, i.e. all algorithms run Ntimes independently.

And (6.d) setting the values of Precision, Recall rate, recommendation Precision index F1 and information gain IG of the integrated algorithm and each sub-recommendation algorithm provided by the invention to be the average value of the running results of Ntimes.

(6.e) comparing and analyzing the results of each index: if the recommendation precision index F1 of the integrated algorithm is larger than the recommendation precision index F1 values of all the sub-recommendation algorithms, the recommendation precision of the integrated algorithm is higher than that of all the sub-algorithms; if the information gain IG index of the integrated algorithm is larger than the maximum value in the information gain IG index of the sub-recommendation algorithm, the integrated algorithm is stable compared with all sub-algorithms; if the above two conclusions are both true, the technology proposed by the invention is more robust.

The following describes how the personalized location recommendation method based on ensemble learning according to the present invention works in detail by taking a specific location-based social network as an example.

Brightkit is a location-based social networking service provider where users share their location by checking in. The social network comprises 58228 users and 693362 positions, and 214078 social relationships are formed among the users. The brightkit dataset, which collects 4491143 check-in information during the 10 th month from 2008 to 2010, has become one of the most commonly used test datasets by recommendation system researchers. The present invention takes the data in the los angeles area in the brightkit data set as an example for instantiation.

The method comprises the following steps of firstly, collecting and sorting an original user check-in data set C, and converting the original user check-in data set C into a user-position scoring matrix R, wherein the specific operation steps are as follows:

(1.a) select the user in the area of los angeles in the example dataset brightkit to check-in dataset C. The data set consists of 61710 historical check-in records of 2951 addresses of 1233 users, 4216 social relationships are formed among the users, the average check-in times of each user is 50.05 times, the average number of check-in times of each user is 6.84 friends, and the average number of visit times of each position is 20.91 times. Each check-in record contains information such as a user ID, an address ID, an access time, an address longitude, an address latitude, and the like.

(1.b) converting each check-in record into a triplet (u)_i,l_j,n_ij) Wherein u is_iIs the ith user (1 ≦ i ≦ 1233), l_jIs the jth item (1. ltoreq. j. ltoreq.2951), n_ijRepresenting user u_iAccess address l_jThe number of times.

(1.d) calculating user u_iTotal number of visited locations N L C_i。

(1.f) calculating the visited location l_jAll users NUC of_j。

wherein r is_ijRepresenting user u_iFor address l_jScore of n, n_ijRepresenting user u_iAt address l_jNumber of check-ins, NC _ au_jIndicating all users are at location l_jTotal number of accesses of N L C_iRepresenting user u_iTotal number of accessed positions, NC _ al_iRepresenting user u_iTotal number of accesses to all locations, NUC_jIndicating visited location l_jOf all users.

(1.h) find the lowest value min of all scores in the user-location scoring matrix R to be 0 and the highest value max of all scores to be 12.61. And (3) carrying out normalization operation on the user scores obtained in the last step:

wherein r is_ijRepresenting user u_iFor address l_jThe score of (1).

After normalization, the user u_iFor address l_jScore r of_ijIs mapped to [0, 1 ]]In the interval, the value 1 indicates that the user visits the position frequently and likes the position very much; a value of 0 indicates that the user never visited the location, and a higher score value indicates that the user prefers the address.

Summing all scores to form a user-location score matrix R ═ R_ij},i∈[1,1233],j∈[1,2951]Where i denotes a user number and j denotes an address number.

Second, choose L active user u in BSN_aAs a recommended service object. Selecting an arbitrary classType, arbitrary number recommendation sub-algorithm A₁,A₂,…,A_nAnd n is the number of the recommended sub-algorithms. And dividing the addresses accessed by the active users into a training subdata set and an evaluation subdata set according to a certain proportion. And calculating pre-scores for the addresses which are not accessed by the active user and the addresses in the evaluation sub-data sets by using the accessed information of the active user training sub-data sets. The operation steps are as follows:

(2.a) obtaining an active user u in a certain los Angeles region in an example dataset BrightKite_aPersonal information, social relationships, historical access records.

(2.b) selecting four recommendation algorithms A₁User-based collaborative filtering algorithm (UBCF), a₂Singular Value Decomposition (SVD), A₃Socialized based collaborative filtering (SCF), a₄UBCF is a typical representative of a collaborative filtering algorithm based on memory, which can mine personal preferences of users, but cannot provide effective recommendations for new projects or inactive users, i.e., the so-called cold start problem exists, SVD is a typical representative of a matrix decomposition technique in a collaborative filtering algorithm based on a model, which can cope with the cold start problem in UBCF, but has high computational complexity, slow operation speed and yet to be improved in recommendation accuracy, SCF can obtain more accurate recommendations considering that the social relationship among users is a main feature of L BSN, the invention selects SCF as a supplement to UBCF algorithm, i.e., the influence of the social relationship on user behavior patterns is considered on the basis of UBCF algorithm, SCF can obtain more accurate recommendations, but like UBCF, it still has the single-class cold start and recommendation result type problem, and the like UBCF, the KDE algorithm considers the geographical attribute features of locations in L BSN, the KDE algorithm simulates the sign-up probability of each user to the activities as a sub-algorithm, and is not particularly suitable for sparse mining of the sub-algorithm L.

From the unique advantages and disadvantages of the four sub-algorithms, the four sub-algorithms selected by the invention complement each other, and the advantages and the disadvantages are complementary.

(2, c) carrying out model training on each algorithm to obtain each recommended sub-model M_UBCF,M_SVD,M_SCF,M_KDE。

(2.d) setting a uniform address division ratio p to 0.4 for all active users, and enabling the active users u_aThe accessed addresses are divided into training Sub data sets Sub1 according to the proportion_aAnd evaluating the Sub data sets Sub2_a。

(2.e) Using recommendation submodels M_UBCF,M_SVD,M_SCF,M_KDEAnd active user u_aSub1 of the training Sub data set_aUBCF, SVD, SCF, KDE algorithms on the set of unaccessed addresses New L_aAnd evaluating the Sub data sets Sub2_aAddress l in_k(l_k∈NewL_a∪Sub2_a) Calculating pre-scores, respectively

(3.a) collecting recommendation sub-algorithms A_x(x is more than or equal to 1 and less than or equal to 4) is an active user u_aCalculated pre-scoring information

The Sub data set Sub2 will be evaluated_aAll addresses l in_k(l_k∈Sub2_a) Sorting by pre-scoring, taking the address of M-10 before ranking, and assigning to each algorithm A_xGenerating a training list Top10_ax。

(3.b) collecting active users u_aFor an evaluationEstimate data set Sub2_aWill evaluate the Sub data set Sub2_aPutting the preference sub data set Prefer to the address with the middle real score larger than the goodling ═ 0.05_a。

wherein u is_aRepresenting active users currently enjoying the recommended service, A_x(x is more than or equal to 1 and less than or equal to 4) represents a certain recommended sub-algorithm, a training list TopM_axRepresenting the evaluation of the Sub data set Sub2_aThe set of the top10 addresses with the highest pre-score, the preference sub data set Prefer_aRepresenting the evaluation of the Sub data set Sub2_aAddress set with median truth score greater than 0.05, M represents training list TopM_axThe number of addresses in the set (M ═ 10).

wherein，u_aRepresenting active users currently enjoying the recommended service, A_x(1. ltoreq. x.ltoreq.4) represents a certain recommendation sub-algorithm, M represents a training list TopM_axNumber of addresses in set (M ═ 10), precision (u ═ 10)_a,A_xM) indicates the accuracy of each recommended sub-algorithm, call (u)_a,A_xAnd M) represents the recall rate of each recommended sub-algorithm.

After the four sub-algorithms are run 100 times (each time a group of target users is randomly selected), the frequency histogram based on the recommendation accuracy index F1 on the evaluation data set is shown in fig. 4.

(3.f) calculation as active user u_aRecommendation sub-algorithms A during recommendation_xPrecision weight value W of_ax(x is more than or equal to 1 and less than or equal to 4), and the specific calculation method comprises the following steps:

wherein u is_aRepresenting active users currently enjoying the recommended service, A_x(1. ltoreq. x.ltoreq.4) represents a certain recommendation sub-algorithm, M represents a training list TopM_axNumber of addresses in set (M ═ 10), F1 (u)_a,A_xM) represents each recommendation sub-algorithm A_x(1. ltoreq. x. ltoreq.4) of the recommended precision.

And fourthly, selecting an information gain IG as a system stability evaluation index, comparing and evaluating the real score and pre-score information of the address in the subdata set, and evaluating the system stability of UBCF, SVD, SCF and KDE in a non-malicious attack scene. According to the information gain IG value of each recommended sub-model, the active user u is selected_aComputing a set of stability weight values G_aThe method comprises the following implementation steps:

wherein u is_aRepresenting active users currently enjoying the recommendation service, Sub2_aIndicates that user u is to be active_aThe accessed addresses are divided according to a certain proportion to obtain an evaluation subdata set, and preference is given to the subdata set Prefer_aRepresenting the evaluation of the Sub data set Sub2_aAddress sets with median true scores greater than 0.05.

(4.b) compute Sub data set to be evaluated Sub2_aAddress in (1) according to sub-algorithm A_x(x is more than or equal to 1 and less than or equal to 4) conditional entropy when the recommendation results are classified (classified into recommendation and non-recommendation), and the specific calculation method comprises the following steps:

wherein u is_aRepresenting active users currently enjoying the recommended service, A_x(1. ltoreq. x.ltoreq.4) represents a certain recommended Sub-algorithm, Sub2_aIndicates that user u is to be active_aThe accessed addresses are divided according to a certain proportion to obtain an evaluation subdata set and a training list TopM_axRepresenting the evaluation of the Sub data set Sub2_aThe set of the top10 addresses with the highest medium pre-score, M represents the training list TopM_axNumber of addresses in set (M is 10), TP_axIs the number of addresses really liked by the user in the recommendation list, FN_axThe number of addresses really liked by the user who is not in the recommendation list (not recommended).

IG(u_a,A_x,M)＝D(u_a)-T(u_a,A_x,M) (21)

wherein u is_aRepresenting active users currently enjoying the recommended service, A_x(1. ltoreq. x.ltoreq.4) represents a certain recommendation sub-algorithm, M represents a training list TopM_axNumber of addresses in set (M ═ 10), D (u)_a) Representing the evaluation of the Sub data set Sub2_aEntropy of (1), T (u)_a,A_xM) indicates that the Sub data set Sub2 is to be evaluated_aAddress in (1) according to sub-algorithm A_xThe conditional entropy of the classification (classified into recommendation and non-recommendation) of the recommendation result of (1).

After running the four Sub-algorithms 100 times (randomly selecting a group of target users at a time), it is based on evaluating the Sub-data sets Sub2_aThe frequency histogram of the above information gain IG index is shown in fig. 5.

(4.d) calculation as active user u_aRecommendation sub-algorithms A during recommendation_xSet of stability weights G_ax(x is more than or equal to 1 and less than or equal to 4), and the specific calculation method comprises the following steps:

wherein u is_aRepresenting active users currently enjoying the recommended service, A_x(1. ltoreq. x.ltoreq.4) represents a certain recommendation sub-algorithm, M represents a training list TopM_axNumber of addresses in set (M: 10), IG (u)_a,A_xM) represents each recommendation sub-algorithm A_x(x is more than or equal to 1 and less than or equal to 4).

(5.a) calculation as active user u_aRecommendation sub-algorithms A during recommendation_x(1. ltoreq. x. ltoreq.4) final weighting factor C_axThe specific calculation method comprises the following steps:

(5, c) sorting all addresses which are not visited by the active user according to the final prediction score of the integration algorithm, forming a recommendation list by N positions which are ranked at the top, and enabling the recommendation list to be TopN L ist_aAnd returning to the active users (N can be a multiple of 5, and N is more than or equal to 5 and less than or equal to 50 under the normal condition).

And sixthly, evaluating the robustness of each recommendation system by using the precision index and the stability index, mainly comparing the comprehensive performance of the personalized position recommendation algorithm based on the ensemble learning and the integrated first four sub-algorithms, and evaluating the applicability and the effectiveness of the proposed technology. The method comprises the following implementation steps:

and (6.a) randomly selecting 123 users from the target data set as an active user set AU, and operating an integrated recommendation algorithm and four sub-algorithms for each active user in the set to generate a recommendation list.

(6.c) repeat steps (6.a) and (6.b) 100 times, i.e. all algorithms run 100 times independently.

Box-shaped graphs of 100 accuracy rates Precision, Recall rate and recommendation accuracy index F1 generated by the integrated model in the process of 100 running are respectively shown in FIG. 6, FIG. 7 and FIG. 8.

And (6.d) setting the values of Precision, Recall, recommendation Precision index F1 and information gain IG of the integrated algorithm and the four sub-recommendation algorithms provided by the invention to be the average value of the results of 100 runs. When N takes different values, the accuracy Precision, Recall, recommendation Precision index F1, and information gain IG results of each recommendation algorithm are shown in tables 2, 3, 4, and 5, respectively:

TABLE 2 accuracy Precision index values for different recommendation algorithms

TABLE 3 Recall ratio Recall index values for different recommendation algorithms

TABLE 4 recommendation accuracy F1 index values for different recommendation algorithms

TABLE 5 information gain IG index values for different recommendation algorithms

In this case, a histogram of the integrated model compared with the recommendation accuracy index F1 for each submodel is shown in fig. 9, and a histogram of the information gain IG index compared with each other is shown in fig. 10.

(6.e) comparing and analyzing the results of each index: the Precision, Recall rate and recommendation Precision index F1 of the integrated algorithm are all larger than the corresponding index values of all the sub-recommendation algorithms, and the recommendation Precision of the integrated algorithm is higher than that of all the sub-algorithms; the information gain IG index of the integrated algorithm is larger than the maximum value in the information gain IG index of the sub-recommendation algorithm, which shows that the integrated algorithm is more stable than all sub-algorithms; the above two conclusions illustrate the robustness of the proposed technique.

The method is different from a conventional integrated algorithm, aims to construct a position recommendation system with strong expandability, high recommendation precision and stable recommendation result, considers the accuracy of the recommendation result and the diversity characteristics of data utilization and user behaviors, innovatively provides an evaluation mode using information gain as a system stability index, quantifies uncertainty caused by data source limitation (such as sparsity and cold start), different data preprocessing modes and model training, and improves the stability of the output result of the recommendation system. In addition, a set of weighting coefficients is customized for each user, and the integration algorithm can be ensured to be biased to different sub-algorithms for different users in a personalized weighting mode. The technology provided by the invention is beneficial to improving the robustness of the recommendation system, enhancing the service quality of the recommendation system, having wide application prospect and being expected to be widely applied to the social network market based on the position.

The above-described process flow is only a preferred embodiment of the present invention, but does not represent all the details of the present invention. Any modification, equivalent replacement, and improvement made by those skilled in the art within the technical scope of the present disclosure within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1.A personalized position recommendation method based on ensemble learning is characterized by comprising the following steps:

step 1, collecting and sorting an original user sign-in data set C, and converting the original user sign-in data set C into a user-position scoring matrix R;

step 2, selecting a social network based on the positionL active user u in BSN_aAs a recommendation service object, selecting any type and any number of recommendation sub-algorithms A₁,A₂,…,A_nDividing the addresses visited by the active user into a training subdata set and an evaluation subdata set, and calculating pre-scores for the addresses not visited by the active user and the addresses in the evaluation subdata set by using the visited information of the training subdata set of the active user;

step 3, selecting an evaluation index F1 of recommendation precision as a recommendation precision evaluation index, comparing the real score and pre-score information of the address in the evaluation sub-data set, evaluating the recommendation precision of each sub-model, and providing an active user u according to the recommendation precision index F1 value of each recommendation sub-model_aComputing a set of precision weight values W_a；

Step 4, selecting information gain IG as a system stability evaluation index, comparing and evaluating the real score and pre-score information of the address in the sub data set, evaluating the system stability of each sub model in a non-malicious attack scene, and providing active users u with the information gain IG value of each recommended sub model_aComputing a set of stability weight values G_a；

Step 5, based on two groups of weighting coefficients, the active users u_aCalculating the final total weighting coefficient C_aPre-scoring the inaccessible addresses by the recommended submodels by a total weighting factor C_aMerging, generating a final prediction score of the integrated model for the inaccessible addresses, sequencing all the inaccessible addresses according to the final prediction score, and providing a recommendation list consisting of a plurality of addresses ranked at the top for active users;

and 6, comparing the comprehensive performance of the personalized position recommendation algorithm based on ensemble learning and each sub-algorithm before integration, and evaluating the applicability and effectiveness of the proposed technology.

2. The method for personalized position recommendation based on ensemble learning according to claim 1, wherein step 1 of the method comprises:

step 11, selecting a user check-in data set C of a target recommendation system, wherein the data set consists of historical check-in records of L addresses of U users, and extracting user ID, address ID, access time, address longitude and address latitude information from each check-in record;

step 12: converting each check-in record to a triplet (u)_i,l_j,n_ij) Wherein u is_iIs the ith user (1 ≦ i ≦ U), l_jIs the jth item (1. ltoreq. j. ltoreq. L), n_ijRepresenting user u_iAccess address l_jThe number of times of (c);

step 13: calculate all users at location l_jTotal number of accesses NC _ au_j；

Step 14: computing user u_iTotal number of visited locations N L C_i；

Step 15: computing user u_iTotal number of accesses to all locations NC _ al_i；

Step 16: calculating visited location l_jAll users NUC of_j；

And step 17: user u_iAt address l_jNumber of check-ins n_ijConversion to user u_iFor address l_jScore r of_ijThe specific method comprises the following steps:

wherein r is_ijRepresenting user u_iFor address l_jScore of n, n_ijRepresenting user u_iAt address l_jNumber of check-ins, NC _ au_jIndicating all users are at location l_jL denotes the total number of addresses, N L C_iRepresenting user u_iTotal number of accessed positions, NC _ al_iRepresenting user u_iTotal number of visits to all locations, U representing total number of users, NUC_jIndicating visited location l_jThe number of all users of (c);

step 18: the user score is normalized, and the specific calculation method comprises the following steps:

wherein r is_ijRepresenting user u_iFor address l_jMin represents the lowest value of all scores in the user-location scoring matrix R, and max represents the highest value of all scores;

summing all scores to form a user-location score matrix R ═ R_ij},i∈[1,U],j∈[1,L]。

3. The ensemble learning-based personalized position recommendation method according to claim 1, wherein step 2 of the method comprises:

step 21: obtaining a certain active user u of the current service of the recommendation system_aThe information of (a);

step 22: selecting a set of recommendation algorithms A according to application scenes and data characteristics₁,A₂,…,A_nA sub-algorithm as an integration model;

step 23: according to the operation mechanism of each recommended sub-algorithm, model training is carried out on each algorithm to obtain each recommended sub-model M₁,M₂,…,M_n；

Step 24: setting a uniform address division ratio p for all active users, and dividing the active users u_aThe accessed addresses are divided into Sub data sets Sub1 according to the proportion_aAnd Sub data set Sub2_a；

Step 25: using recommendation submodels M₁,M₂,…,M_nAnd active user u_aSub1_aSubdata set information for the set of unaccessed addresses New L_aAnd Sub data set Sub2_aAddress l in_k(l_k∈NewL_a∪Sub2_a) Calculate a pre-score, as

4. The method for personalized position recommendation based on ensemble learning according to claim 1, wherein the step 3 comprises:

step 31: collecting each recommendation sub-algorithm A in the second step_x(x is more than or equal to 1 and less than or equal to n) is an active user u_aCalculated pre-scoring information, Sub2_aAll addresses in the set are sorted according to pre-scores, and the address of M before the rank is taken to generate a set TopM_ax；

Step 32: collecting active users u_aFor Sub2_aTrue score of all addresses in Sub2_aAddresses with a median true score greater than goodling are put into the set Prefer_a；

Step 33: calculating each recommendation sub-algorithm A_xThe accuracy Precision of Precision is calculated by the following specific method:

wherein u is_aRepresenting active users currently enjoying the recommended service, A_x(x is more than or equal to 1 and less than or equal to n) represents a certain recommendation sub-algorithm (n is the number of the recommendation sub-algorithms), and a training list TopM_axRepresenting the evaluation of the Sub data set Sub2_aThe set of the top M addresses with the highest pre-score, the preference sub data set Prefer_aRepresenting the evaluation of the Sub data set Sub2_aThe middle real score is larger than the address set of goodling, M represents the training list TopM_axThe number of addresses in the set;

step 34: calculating each recommendation sub-algorithm A_xThe Recall rate Recall comprises the following specific calculation method:

wherein u is_aRepresenting active users currently enjoying the recommended service, A_x(x is more than or equal to 1 and less than or equal to n) represents a certain recommendation sub-algorithm (n is the number of the recommendation sub-algorithms), and a training list TopM_axRepresenting the evaluation of the Sub data set Sub2_aSet of top M addresses with highest medium pre-scorePreference sub data set Prefer_aRepresenting the evaluation of the Sub data set Sub2_aThe middle real score is larger than the address set of goodling, M represents the training list TopM_axThe number of addresses in the set;

step 35: calculating each recommendation sub-algorithm A_xThe specific calculation method of the comprehensive accuracy index F1 is as follows:

wherein u is_aRepresenting active users currently enjoying the recommended service, A_x(x is more than or equal to 1 and less than or equal to n) represents a certain recommendation sub-algorithm (n is the number of recommendation sub-algorithms), and M represents a training list TopM_axNumber of addresses in the set, precision (u)_a,A_xM) represents each recommendation sub-algorithm A_x(x is not less than 1 and not more than n) accuracy, called (u)_a,A_xM) represents each recommendation sub-algorithm A_x(1. ltoreq. x. ltoreq.n) recall;

step 36: calculated as active user u_aRecommendation sub-algorithms A during recommendation_xThe specific calculation method of the precision weight value set is as follows:

5. The ensemble learning-based personalized position recommendation method according to claim 1, wherein step 4 of the method comprises:

step 41: compute Sub2_aThe specific calculation method of the information entropy in (1) is as follows:

wherein u is_aRepresenting active users currently enjoying the recommendation service, Sub2_aIndicates that user u is to be active_aThe accessed addresses are divided according to a certain proportion to obtain an evaluation subdata set, and preference is given to the subdata set Prefer_aRepresenting the evaluation of the Sub data set Sub2_aThe address set with the middle real score larger than goodraring;

step 42: compute Sub2_aAddress in (1) according to sub-algorithm A_x(x is more than or equal to 1 and less than or equal to n) and classifying the recommendation results into recommendation and non-recommendation, wherein the specific calculation method comprises the following steps:

wherein u is_aRepresenting active users currently enjoying the recommended service, A_x(1 ≦ x ≦ n) represents a recommendation Sub-algorithm (n is the number of recommendation Sub-algorithms), Sub2_aIndicates that user u is to be active_aThe accessed addresses are divided according to a certain proportion to obtain an evaluation subdata set and a training list TopM_axRepresenting the evaluation of the Sub data set Sub2_aThe set of the first M addresses with the highest pre-score, M represents the training list TopM_axNumber of addresses in the set, TP_axIs the number of addresses really liked by the user in the recommendation list, FN_axThe number of addresses really liked by the user who is not in the recommendation list (not recommended);

step 43: compute Sub2_aAddress in (1) according to sub-algorithm A_xThe specific calculation method of the information gain after classification of the recommendation result is as follows:

IG(u_a,A_x,M)＝D(u_a)-T(u_a,A_x,M) (9)

u_arepresenting active users currently enjoying the recommended service, A_x(x is more than or equal to 1 and less than or equal to n) represents a certain recommendation sub-algorithm (n is the number of recommendation sub-algorithms), and M represents a training list TopM_axAddresses in a setNumber, D (u)_a) Representing the evaluation of the Sub data set Sub2_aEntropy of (1), T (u)_a,A_xM) indicates that the Sub data set Sub2 is to be evaluated_aAddress in (1) according to sub-algorithm A_xConditional entropy when the recommendation results of (1) are classified (into recommended and not recommended);

step 44: calculated as active user u_aRecommendation sub-algorithms A during recommendation_xThe specific calculation method of the stability weight value set weighting coefficient is as follows:

wherein G is_axPresentation recommendation sub-algorithm A_xPre-scored stability weight value, u_aRepresenting active users currently enjoying the recommended service, A_x(x is more than or equal to 1 and less than or equal to n) represents a certain recommendation sub-algorithm (n is the number of recommendation sub-algorithms), and M represents a training list TopM_axNumber of addresses in the set, IG (u)_a,A_xM) represents each recommendation sub-algorithm A_x(x is more than or equal to 1 and less than or equal to n).

6. The ensemble learning-based personalized location recommendation method according to claim 1, wherein step 5 of the method comprises:

step 51: calculated as active user u_aRecommendation sub-algorithms A during recommendation_xAnd (x is more than or equal to 1 and less than or equal to n), wherein the specific calculation method comprises the following steps:

wherein, C_axIs an active user u_aFinal weighting factor at recommendation, W_axDenoted as active user u_aWhen recommending, the recommendation sub-algorithm A_xPre-scored precision weight value, G_axPresentation recommendation sub-algorithm A_xA pre-scored stability weight value;

step 52: for active user u_aLocation i not visited_k(l_k∈NewL_a) And calculating final prediction scores for the positions by an integrated algorithm, wherein the specific calculation method comprises the following steps:

is the recommendation sub-algorithm A_xFor active user u_aLocation i not visited_kPre-scoring;

step 53, for set New L_aAll the addresses in the list are sorted according to the final forecast score of the integration algorithm, the top N positions form a recommendation list, and the recommendation list is TopN L ist_aAnd returning to the active user.

7. The method for personalized position recommendation based on ensemble learning according to claim 1, wherein the step 6 comprises:

step 61, randomly selecting U × 10% of users from the target data set as an active user set AU, and operating each recommendation algorithm for each active user in the set to generate a recommendation list;

step 62: evaluating the robustness of each recommendation system by using the Precision index and the stability index, wherein the values of Precision indexes Precision, Recall, F1 and stability index IG of each algorithm which runs once for an active user set AU are the average value of the indexes of all users in the AU set;

and step 63: repeating the steps 61 and 62 Ntimes, namely independently running all algorithms for Ntimes;

step 64: setting the Precision, Recall, F1 and IG values of the integration algorithm and each sub-recommendation algorithm as the average value of the Ntimes running results;

step 65: and comparing and analyzing the results of all indexes: if the F1 value of the integration algorithm is larger than the F1 values of all the sub-recommendation algorithms, the recommendation precision of the integration algorithm is higher than that of all the sub-algorithms; if the IG index of the integrated algorithm is larger than the maximum value in the IG indexes of the sub-recommendation algorithms, the integrated algorithm is stable compared with all the sub-algorithms; if the two conclusions are established, the robustness of the integrated algorithm is stronger.

8. The personalized position recommendation method based on ensemble learning of claim 1, wherein the method divides addresses visited by an active user into a training sub-data set and an evaluation sub-data set according to a certain proportion, selects a plurality of recommendation sub-algorithms of any type, utilizes historical scoring information in the training sub-data set of the active user, calculates pre-scores of other addresses for the active user by each sub-algorithm, compares the historical scores and the pre-scoring information of the addresses in the evaluation sub-data set, carries out accuracy evaluation and stability evaluation on each sub-algorithm, generates personalized weighting coefficients for the active user according to evaluation results, combines the pre-scores of the non-visited addresses by each sub-algorithm by using the weighting coefficients, generates a final prediction score of the non-visited addresses of the active user by an integration model, and sorts the prediction scores of all the non-visited addresses, and selecting a plurality of addresses with the top ranking to recommend to the active users.