CN111475744B

CN111475744B - Personalized position recommendation method based on ensemble learning

Info

Publication number: CN111475744B
Application number: CN202010257793.0A
Authority: CN
Inventors: 朱俊; 韩立新; 勾智楠; 杨忆; 袁晓峰; 李树; 李景仙
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2020-04-03
Filing date: 2020-04-03
Publication date: 2022-06-14
Anticipated expiration: 2040-04-03
Also published as: CN111475744A

Abstract

The invention discloses a personalized position recommendation method based on ensemble learning, which comprises the following steps: firstly, converting a check-in data set into a scoring matrix; and secondly, selecting a plurality of recommendation sub-algorithms, and dividing the address accessed by the active user into a training sub-data set and an evaluation sub-data set. Utilizing the training sub data set, and calculating pre-scores for the addresses in the evaluation data set and the addresses which are not visited by each sub algorithm; thirdly, calculating the recommendation precision F1 of each sub-model by using the evaluation data set to generate a precision weight value set; selecting information gain IG as a stability index, evaluating the stability of each sub-model, and calculating a stability weight value set; and fifthly, calculating a final total weighting coefficient for the active users. The integration model fuses pre-scores of the inaccessible addresses by each sub-algorithm according to a total weighting coefficient to generate a final prediction score; and sixthly, evaluating the comprehensive performance of the method and each sub-algorithm before integration, and evaluating the effectiveness of the method.

Description

Personalized position recommendation method based on ensemble learning

Technical Field

The invention relates to an integrated learning-based personalized position recommendation method in a social network, and belongs to the technical field of artificial intelligence and machine learning.

Background

Location-based Social Networks (lbs ns) are products of gradual merging and development of Online Social Networks (Online Social Networks) and Location-based services (Location-based services), and provide a platform for close connection between an Online virtual Network and an offline real world. In recent years, with the widespread popularity of mobile devices and the rapid development of location technologies, a large array of location-based social networks has rapidly emerged. In the lbs ns, complex social relationships, such as friendships, coworkers, relatives, and the like, may be established between users. The user can also use the added geographic information in the social network to view points-of-interest (POIs), such as restaurants, shops, movie theaters and the like, check in by using the mobile device when visiting the points of interest, publish the geographic position information of the points of interest, and share the suggestions and comments of the points of interest. LBSSNs help merchants further learn about the real users behind the network, thereby "making good" the customization of personalized services to different users that meet their needs.

As the number of users registered in the lbs ns increases, the lbs ns store and accumulate abundant available information, and the user cannot quickly and effectively find the information required by the user within a limited time due to the massive information. Therefore, recommendation systems that address the "information overload" problem are receiving increasing attention from researchers. For example, the famous Amazon company uses a recommendation system to recommend commodities to users, so that the click rate and the turnover are improved for merchants; the movie recommendation website Netflix attracts a plurality of research teams to focus on research for improving recommendation accuracy by holding a recommendation system contest. As a special information filtering system, the recommending system does not need the user to actively provide determined keyword information, but models the interests and hobbies of the user by analyzing the existing historical behaviors of the user, mines the potential preference of the user, and then actively recommends commodities, services and the like meeting the requirements of the user. Based on a large amount of user information, friend information and position information, researchers face lbs ns to realize applications such as friend recommendation, expert discovery, position recommendation, activity recommendation, path recommendation and the like. Among them, the research of location recommendation is a research hotspot in this field at present.

The recommendation algorithm is a main technical composition of the recommendation system, and the operation efficiency of the recommendation system and the accuracy of the recommendation result are determined to a great extent by the efficiency of the algorithm. Depending on the design strategy, the recommendation algorithms mainly include collaborative filtering algorithms, content-based recommendation algorithms, and hybrid recommendation algorithms, wherein the collaborative filtering algorithms in turn include memory-based collaborative filtering algorithms (e.g., user-based collaborative filtering (UBCF), project-based collaborative filtering (IBCF)) and model-based collaborative filtering algorithms (e.g., Singular Value Decomposition (SVD), clustering models, Probabilistic Latent Semantic Analysis (PLSA)). In content-based location recommendations, a number of characteristics such as tags, categories, and user comments may be extracted from the location. The user's preferences are extracted from the user's profile and then matched against the location profile to obtain accurate recommendations. The UBCF algorithm converts the sign-in behavior of the user into a user-position scoring matrix, finds similar users of the current active user by using the information of the data set, predicts the scoring of the active user on the non-sign-in places by using the interest preference of the users, and recommends the position with the highest predicted scoring to the current user. The IBCF algorithm is based on the assumption that: the user always prefers an address that is highly similar to the item he previously liked. The IBCF algorithm therefore first calculates the similarity between locations and recommends to the active user the address that most closely resembles the user's POIs (highest predicted score). The SVD algorithm is a classical representation of matrix decomposition, whose main task is to generate low rank approximations. The low-dimensional orthogonal matrix decomposed by the SVD algorithm reduces noise on the basis of the original matrix, and can more effectively reveal potential association of users and commodities. In the SVD algorithm, some common characteristics exist among items, a user likes an item because the user scores the characteristics higher, and by decomposing the score of the user into the characteristics by a linear algebra method, the preference of the user for an inaccessible address can be predicted according to the preference degree of the user for the characteristics.

The above conventional recommendation techniques neither take into account the geographic characteristic impact of the location nor take advantage of social relationships between users. However, each location in the location recommendation system has geographic features identified by latitude and longitude, and the geographic features of POIs can have a significant impact on the user's access preferences. In addition, the social relationship of the user also affects the check-in behavior of the user, and when the user does not determine the place where the user wants to go, the user often refers to the historical access records of friends in the social network. Therefore, when designing a location recommendation algorithm, it is necessary to take the factors in the aspect of the situation into consideration, mine the geographic features of the location, and utilize the social relationship between users.

Social-based Collaborative Filtering (SCF) is a recommended method that considers both the personal preferences and Social relationships of users, and is based on the assumption that friends all have the same interest preferences and are easily influenced by each other, and active users are more willing to make a decision for themselves through the experience of friends. In SCF, only the preferences of friends of the active user need to be considered when calculating the predictive rating of an address. In calculating the similarity between an active user and his friends, the history scores of both visited places can be used, the geographic distance between the user's residence can be used, or the similarity of the intersection and check-in history of their friendship networks can be considered. In addition, some research is dedicated to mining the geographic features of the location, and some techniques use matrix factorization, and more algorithms simulate the geographic influence through a common probability distribution, such as power law distribution, multi-center gaussian distribution, Kernel Density Estimation (KDE), and so on. When the KDE is used for predicting the probability of the user accessing the new position, the influence of the geographical position on the check-in activity of each user is personalized, and a more expressive geographical perception recommendation system is constructed.

However, most of the current position recommendation technologies are single algorithm models, and all the models are based on certain theoretical assumptions, so that each algorithm has inherent defects and can only play excellent roles in a specific application scenario. For example, content-based location recommendation is suitable for dealing with cold start problems, but it requires a large amount of structural information for users and locations, which increases the storage and computation costs of the system; the UBCF and IBCF algorithms only consider the neighborhood effect in the rating data, so that although the user preference is mined, the characteristics of the item content are ignored, and the diversity of the recommendation result is limited; the SVD algorithm has high computational complexity and low running speed, and the recommendation accuracy is still to be improved. To overcome the limitations of a single algorithm, some researchers have focused on how to combine a small number of several scoring prediction methods into a single overall model. Ensemble learning is just an effective means to solve this problem. Ensemble learning is a new machine learning paradigm that can effectively improve the generalization of learning systems by using multiple weak learners to solve the same problem. The authoritative Dietterich in the international field of machine learning has pointed out that ensemble learning is the first of four major research directions for machine learning (ensemble learning, symbolic learning, statistical learning, and reinforcement learning). Ensemble learning can exceed a single learning algorithm in several respects: the method has better average performance in different fields and data sets; a combined solution which cannot be obtained by any single learning algorithm can be found; the variation of the sampling is less sensitive to noise and outliers; solutions can be obtained by combining multiple distributed data sources or multiple characteristics of data sources, and the resultant fusion of multiple data sources or multiple characteristics of data sources is becoming increasingly important in distributed data mining. The effectiveness of the ensemble learning technology enables the ensemble learning technology to be widely applied to a plurality of fields such as biological feature recognition, computer-aided medical diagnosis, text recognition, Web information filtering and the like.

At present, some recommendation systems apply ensemble learning to personalized recommendation problems, and effectiveness and adaptability of information recommendation are improved. However, relevant research proves that the existing recommendation system based on ensemble learning still has many defects and shortcomings, which summarize the following points:

(1) in the fusion process, the number of considered sub-algorithms is fixed, the types of the considered sub-algorithms are limited, and the expandability of the integrated model is not strong. At present, most popular position recommendation systems based on ensemble learning only consider the fusion of two algorithms, and sub-algorithms are generally a certain collaborative filtering algorithm or position access probability estimation, so that the improvement range of the application scene and the system performance is limited to a certain extent. The existing integration framework cannot support the fusion of any number and any kind of recommendation sub-algorithms.

(2) The integrated algorithm needs to set some weighting coefficients to fuse the prediction results of the sub-algorithms into a final prediction score, and the weighting coefficients represent the importance of the sub-algorithms. The fusion rule of existing algorithms is usually addition or multiplication or other simple linear combination, whose weighting coefficients are consistent for all users. However, in the real world, since the characteristics of each user and each item are different, the optimal sub-algorithms are not consistent for different users, that is, the algorithms most capable of mining and reflecting the user interests in the sub-algorithms are different from person to person. It follows that it is necessary to tailor a set of weighting coefficients for each user, ensuring that the integration algorithm can "bias" different sub-algorithms for different users by way of personalized weighting.

(3) In order to enhance the user experience, a good recommendation system should have the feature of robustness. The robustness of the recommendation system contains two indispensable factors of accuracy and stability. However, most of the current research is focused on only one of these aspects. In fact, the accuracy of the prediction determines whether the user likes the recommended location, and the stability of the system reflects whether the recommendation system can produce consistent recommendations in various application scenarios. Ignoring any of these aspects can affect the user's stickiness and reduce the profits of the service provider.

(4) At present, few stability studies almost limit application scenarios to malicious attacks, for example, an attacker tries to recommend a preset item to a user. However, in addition to malicious attacks, the uncertainty caused by data source limitations (such as sparsity and cold start), different data preprocessing modes and model training can also cause inconsistency of recommendation results, and affect the stability of the system. But the system stability research under the non-malicious attack scene is almost blank.

The above-mentioned disadvantages of the existing recommendation system technology based on ensemble learning bring about major disadvantages in the design, development, deployment and operation of different e-commerce platforms, and especially cause the service quality of the recommendation system to be reduced on the network platform of massive project information, thereby affecting the sales performance of the e-commerce system.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides an integrated learning-based personalized position recommendation method aiming at constructing a position recommendation system with strong expandability, high recommendation precision and stable recommendation result, and systematically provides a technical flow scheme of an integrated recommendation algorithm. Meanwhile, the system theory is taken as a theoretical basis, the robustness evaluation system is taken as a necessary component of the recommendation system, the accuracy of the recommendation result is considered, the diversity characteristics of data utilization and user behaviors under non-malicious attack are also considered, an evaluation mode using information gain as a system stability index is innovatively provided, the uncertainty caused by data source limitation (such as sparsity and cold start), different data preprocessing modes and model training is quantized, and the stability of the output result of the recommendation system is improved. In addition, in the integrated model, personalized weighting is carried out on each sub-model, an integrated recommendation algorithm which best meets the interest characteristics of the user is customized for the user, and the service quality of the recommendation system is further enhanced.

The technical scheme adopted by the invention for solving the technical problems is as follows: the invention divides the address accessed by the active user into a training subdata set and an evaluation subdata set according to a certain proportion. Selecting a plurality of recommendation sub-algorithms of any type, and calculating pre-scores of other addresses for the active users by using historical score information in the active user training sub-data sets by each sub-algorithm. And comparing and evaluating historical scores and pre-score information of the addresses in the sub-data sets, carrying out accuracy evaluation and stability evaluation on each sub-algorithm, and generating personalized weighting coefficients for active users according to evaluation results. Combining the pre-scores of the sub-algorithms on the inaccessible addresses by using weighting coefficients to generate a final prediction score of the integrated model on the inaccessible addresses of the active users, sequencing the prediction scores of all the inaccessible addresses, and selecting a plurality of addresses ranked at the top to recommend to the active users (as shown in fig. 1).

The specific process of the method comprises the following steps:

step 1, collecting and sorting an original user sign-in data set C, and converting the original user sign-in data set C into a user-position scoring matrix R.

Step 2, selecting a certain active user u in the location-based social network LBSN N_aAs a recommended service object. Selecting any type and any number of recommendation sub-algorithms A₁,A₂,…,A_n. Will be activeThe addresses accessed by the user are divided into a training subdata set and an evaluation subdata set according to a certain proportion. And calculating pre-scores for the addresses which are not accessed by the active user and the addresses in the evaluation sub-data sets by using the accessed information of the active user training sub-data sets.

And 3, selecting the evaluation index F1 of the recommendation accuracy as a recommendation accuracy evaluation index, and comparing the real score and the pre-score information of the address in the evaluation sub-data set to evaluate the recommendation accuracy of each sub-model. According to the recommendation precision index F1 value of each recommendation submodel, the active user u is selected_aComputing a set of precision weight values W_a。

And 4, selecting the information gain IG as a system stability evaluation index, comparing the real score and the pre-score information of the address in the evaluation sub data set, and evaluating the system stability of each sub model in a non-malicious attack scene. According to the information gain IG value of each recommended sub-model, the active user u is selected_aComputing a set of stability weight values G_a。

Step 5, comprehensively considering the robustness of the integrated recommendation system, balancing the relationship between the recommendation precision and the system stability, and providing the active users u on the basis of two groups of weighting coefficients_aCalculating the final total weighting coefficient C_a. Pre-scoring the inaccessible addresses by each recommended submodel according to a total weighting coefficient C_aAnd fusing to generate the final prediction scores of the integration model on the unaccessed addresses. And sorting all the inaccessible addresses according to the final prediction scores, and providing a recommendation list consisting of a plurality of addresses which are ranked at the top for the active users.

And 6, evaluating the robustness of each recommendation system by using the precision index and the stability index, mainly comparing the comprehensive performance of the personalized position recommendation algorithm based on the ensemble learning and each sub-algorithm before the ensemble, and evaluating the applicability and the effectiveness of the proposed technology.

Has the advantages that:

1. the invention has strong expandability and supports the fusion of any type and any number of recommendation sub-algorithms. In practical application, the method and the system can select a proper recommendation sub-algorithm according to different application scenes and different data characteristics, obtain higher recommendation quality on the basis of any one existing algorithm, improve the user stickiness in the location-based social network, and help merchants accurately push advertisements for the users, so that more potential consumers are attracted.

2. According to the invention, a group of weight coefficients are customized for each active user by analyzing different behavior characteristics of each user, and the integrated algorithm can be ensured to be the sub-algorithm which can most mine the interest of different users according to the 'bias' of the different users in a personalized weighting mode. The integration mode of 'customized according to different persons' greatly improves the use satisfaction degree of users to the social network platform, is also beneficial to solving other machine learning problems, and has very important significance to practical application.

3. In the fusion process, the recommendation precision index F1 value integrating the accuracy and the recall rate is selected as the evaluation index of the recommendation accuracy, so that the integrated model is better than each sub-model in the recommendation accuracy, the preference degree of the recommendation result to the user is ensured, and the aim of improving the prediction accuracy of the recommendation algorithm is fulfilled.

4. The method innovatively uses the information gain as the evaluation index of the stability of the recommendation system, fully considers the uncertainty caused by data source limitation (such as sparsity and cold start), different data preprocessing modes and model training, can measure a plurality of factors causing the instability of the system, and ensures the system stability of the recommendation system in a non-malicious attack scene.

5. The method comprehensively considers the prediction accuracy and the system stability, and robustly improves the service quality of the recommendation system. The method has certain universality and portability, can be applied to a position recommendation system, is also suitable for the personalized recommendation field of other traditional projects, and has wide industrial application prospects.

6. The method aims at constructing the position recommendation system with strong expandability, high recommendation precision and stable recommendation result, considers the diversity characteristics of data utilization and user behaviors while considering the accuracy of the recommendation result, innovatively provides an evaluation mode using information gain as a system stability index, quantifies uncertainty caused by data source limitation (such as sparsity and cold start), different data preprocessing modes and model training, and well promotes the stability of the output result of the recommendation system.

Drawings

Fig. 1 is a flowchart of a personalized position recommendation method based on ensemble learning according to the present invention.

Fig. 2 is a flowchart of specific steps of the personalized position recommendation method based on ensemble learning according to the present invention.

FIG. 3 is a flow chart of the steps of the present invention for converting raw user check-in records to a user-location scoring matrix.

FIG. 4 is a frequency histogram of recommendation accuracy indicators F1 on the evaluation sub data set after each recommendation sub-algorithm has been run 100 times (each time a group of target users is randomly selected) in an embodiment of the present invention.

Fig. 5 is a frequency histogram of each recommended sub-algorithm after 100 runs (randomly selecting a group of target users each time) in an embodiment of the present invention based on evaluating the information gain IG on the sub-data set.

FIG. 6 is a box plot of the accuracy of the integrated model after 100 runs in an embodiment of the present invention.

FIG. 7 is a box plot of the recall after 100 runs of the integrated model in an embodiment of the present invention.

FIG. 8 is a box diagram of the recommended accuracy index F1 after 100 runs of the integration model in an embodiment of the invention.

FIG. 9 is a histogram comparing the recommended accuracy index F1 for each sub model with the integrated model in the embodiment of the present invention.

FIG. 10 is a histogram comparing the integrated model with the information gain IG of each sub-model in an embodiment of the present invention.

Detailed Description

The invention is described in further detail below with reference to the attached drawing figures and specific examples.

The specific flow of the design and implementation of the invention is shown in fig. 2, and the main variables and parameters in the process are shown in table 1.

TABLE 1 Functions of the principal variables and parameters

The method comprises the following steps of firstly, collecting and sorting an original user check-in data set C, and converting the original user check-in data set C into a user-position scoring matrix R, wherein the specific flow is shown in FIG. 3, and the operation steps are as follows:

(1.a) selecting a user check-in dataset C of the target recommendation system. The data set consists of historical check-in records of L addresses by U users, and information such as user IDs, address IDs, access times, address longitudes, address latitudes and the like is extracted from each check-in record.

(1.b) converting each check-in record into a triplet (u)_i,l_j,n_ij) Wherein u is_iIs the ith user (1 ≦ i ≦ U), l_jIs the jth item (1. ltoreq. j. ltoreq.L), n_ijRepresenting user u_iAccess address l_jThe number of times.

(1.c) calculating the location l of all users_jTotal number of accesses NC _ au_j。

(1.d) calculating user u_iTotal number of visited locations NLC_i。

(1.e) calculating user u_iTotal number of accesses to all locations NC _ al_i。

(1.f) calculating the visited location l_jAll users NUC of_j。

(1.g) user u_iAt address l_jNumber of check-ins n_ijConversion to user u_iFor address l_jScore r of_ijThe specific method comprises the following steps:

wherein r is_ijRepresenting user u_iFor address l_jScore of n, n_ijRepresenting user u_iAt address l_jNumber of check-ins, NC _ au_jIndicating all users are at location l_jL represents the total number of addresses, NLC_iRepresenting user u_iTotal number of accessed positions, NC _ al_iRepresenting user u_iTotal number of visits to all locations, U representing total number of users, NUC_jIndicating visited location l_jOf all users.

(1, h) carrying out normalization operation on the user scores, wherein the specific calculation method comprises the following steps:

wherein r is_ijRepresenting user u_iFor address l_jMin represents the lowest value of all scores in the user-location score matrix R, and max represents the highest value of all scores. After normalization, the user u_iFor address l_jScore r of_ijIs mapped to [0, 1 ]]In the interval, the value 1 indicates that the user visits the position frequently and likes the position very much; a value of 0 indicates that the user never visited the location, and a higher score value indicates that the user prefers the address.

Summing all scores to form a user-location score matrix R ═ R_ij},i∈[1,U],j∈[1,L]Where i denotes a user number, j denotes an address number, U denotes a total number of users, L denotes a total number of addresses, r_ijRepresenting user u_iFor address l_jThe score of (1).

Second, select a certain active user u in LBS N_aAs a recommended service object. Selecting any type and any number of recommendation sub-algorithms A₁,A₂,…,A_nAnd n is the number of the recommended sub-algorithms. And dividing the addresses accessed by the active users into a training subdata set and an evaluation subdata set according to a certain proportion. Using active user training subdata setsThe accessed information, each recommendation sub-model calculates pre-scores for addresses that the active user has not accessed and addresses in the evaluation sub-data set. The operation steps are as follows:

(2.a) obtaining a certain active user u currently served by the recommendation system_aThe information of (1).

(2.b) selecting a set of recommendation algorithms A according to the application scene and the data characteristics₁,A₂,…,A_n(n is the number of recommended sub-algorithms) as the sub-algorithms of the integration model in the invention, for example, a collaborative filtering algorithm (UBCF) based on a user, a collaborative filtering algorithm (IBCF) based on a project, a collaborative filtering (SCF) based on socialization, a Kernel Density Estimation (KDE), a Singular Value Decomposition (SVD), other existing integration algorithms and the like can be selected.

(2, c) carrying out model training on each algorithm according to the operation mechanism of each recommended sub-algorithm to obtain each recommended sub-model M₁,M₂,…,M_n(n is the number of recommended sub-algorithms).

(2, d) setting a uniform address division ratio p for all active users, and dividing the active users u_aThe accessed addresses are divided into training Sub data sets Sub1 according to the proportion_aAnd evaluating the Sub data sets Sub2_a。

(2.e) Using recommendation submodels M₁,M₂,…,M_n(n is the number of recommendation sub-algorithms) and active users u_aSub1 training data set_aFor the set of unaccessed addresses NewL_aAnd evaluating the Sub data sets Sub2_aAddress l in_k(l_k∈NewL_a∪Sub2_a) Calculate a pre-score, as

Thirdly, selecting an evaluation index F1 of recommendation precision as a recommendation accuracy evaluation index, and comparing and evaluating the Sub data sets Sub2_aAnd evaluating the recommendation accuracy of each submodel according to the real scoring and pre-scoring information of the address. According to the recommendation precision index F1 value of each recommendation submodel, the active user u is selected_aCalculating precision weighted valueSet W_aThe method comprises the following implementation steps:

(3.a) using A as each recommendation sub-algorithm in the second step_xAnd (x is more than or equal to 1 and less than or equal to n) represents that n is the number of the recommended sub-algorithms. Collecting each sub-algorithm as active user u_aThe computed pre-scoring information will evaluate the Sub data sets Sub2_aAll addresses in the training list are sorted according to pre-scores, and the address of M before the ranking is taken to generate a training list TopM_axAnd (4) collecting.

(3.b) collecting active users u_aSub2 for evaluating Sub data sets_aWill evaluate the Sub data set Sub2_aPutting addresses with middle real scores larger than goodling into a set preference sub data set Prefer_a。

(3.c) calculating each recommendation sub-algorithm A_xThe accuracy Precision of Precision is calculated by the following specific method:

wherein u is_aRepresenting active users currently enjoying the recommended service, A_x(x is more than or equal to 1 and less than or equal to n) represents a certain recommendation sub-algorithm (n is the number of the recommendation sub-algorithms), and a training list TopM_axRepresenting the evaluation of the Sub data set Sub2_aThe set of the first M addresses with the highest pre-score, the preference subdata set Prefer_aRepresenting the evaluation of the Sub data set Sub2_aThe middle real score is larger than the address set of goodling, M represents the training list TopM_axNumber of addresses in the set.

(3.d) calculating recommendation sub-algorithms A_xThe Recall rate Recall comprises the following specific calculation method:

wherein u is_aRepresenting active users currently enjoying the recommended service, A_x(x is more than or equal to 1 and less than or equal to n) represents a certain recommendation sub-algorithm (n is the number of the recommendation sub-algorithms), and a training list TopM_axRepresenting evaluation subdata setsSub2_aThe set of the top M addresses with the highest pre-score, the preference sub data set Prefer_aRepresenting the evaluation of the Sub data set Sub2_aThe middle real score is larger than the address set of goodling, M represents the training list TopM_axNumber of addresses in the set.

(3.e) calculating recommendation sub-algorithms A_xThe specific calculation method of the comprehensive accuracy index F1 is as follows:

wherein u is_aRepresenting active users currently enjoying the recommended service, A_x(x is more than or equal to 1 and less than or equal to n) represents a certain recommendation sub-algorithm (n is the number of the recommendation sub-algorithms), and M represents a training list TopM_axNumber of addresses in the set, precision (u)_a,A_xM) represents each recommendation sub-algorithm A_x(x is not less than 1 and not more than n) accuracy, called (u)_a,A_xM) represents each of the recommendation sub-algorithms A_x(1. ltoreq. x. ltoreq.n).

(3.f) calculation as active user u_aRecommendation sub-algorithms A during recommendation_xPrecision weight value W of_ax(x is more than or equal to 1 and less than or equal to n), wherein n is the number of the recommended sub-algorithms, and the specific calculation method comprises the following steps:

wherein u is_aRepresenting active users currently enjoying the recommended service, A_x(x is more than or equal to 1 and less than or equal to n) represents a certain recommendation sub-algorithm (n is the number of recommendation sub-algorithms), and M represents a training list TopM_axNumber of addresses in set, F1 (u)_a,A_xM) represents each recommendation sub-algorithm A_x(x is more than or equal to 1 and less than or equal to n) is used.

And fourthly, selecting an information gain IG as a system stability evaluation index, comparing the real score and the pre-score information of the address in the evaluation sub data set, and evaluating the system stability of each sub model in a non-malicious attack scene. According to each recommended sub-modelThe information gain IG value of (1) is active user u_aComputing a set of stability weight values G_aThe method comprises the following implementation steps:

(4.a) compute evaluation Sub data set Sub2_aThe specific calculation method of the information entropy in (1) is as follows:

wherein u is_aRepresenting active users currently enjoying the recommendation service, Sub2_aIndicates that user u is to be active_aThe accessed addresses are divided according to a certain proportion to obtain an evaluation subdata set, and preference is given to the subdata set Prefer_aRepresenting the evaluation of the Sub data set Sub2_aThe medium true score is larger than the address set of goodling.

(4.b) using A as each recommendation sub-algorithm in the second step_xAnd (x is more than or equal to 1 and less than or equal to n) represents that n is the number of the recommended sub-algorithms. The computation will evaluate the Sub data set Sub2_aAddress in (1) according to sub-algorithm A_xThe conditional entropy of the classification (classified into recommendation and non-recommendation) of recommendation results is calculated by the following specific calculation method:

wherein u is_aRepresenting active users currently enjoying the recommended service, A_x(1 ≦ x ≦ n) represents a recommendation Sub-algorithm (n is the number of recommendation Sub-algorithms), Sub2_aIndicates that user u is to be active_aThe accessed addresses are divided according to a certain proportion to obtain an evaluation subdata set and a training list TopM_axRepresenting the evaluation of the Sub data set Sub2_aThe set of the first M addresses with the highest pre-score, M represents the training list TopM_axNumber of addresses in the set, TP_axIs the number of addresses really liked by the user in the recommendation list, FN_axThe number of addresses really liked by the user who is not in the recommendation list (not recommended).

(4.c) compute evaluation Sub data set Sub2_aAddress in (1) according to sub-algorithm A_xThe specific calculation method of the information gain after classification of the recommendation result is as follows:

IG(u_a,A_x,M)＝D(u_a)-T(u_a,A_x,M) (9)

wherein u is_aRepresenting active users currently enjoying the recommended service, A_x(x is more than or equal to 1 and less than or equal to n) represents a certain recommendation sub-algorithm (n is the number of recommendation sub-algorithms), and M represents a training list TopM_axNumber of addresses in the set, D (u)_a) Representing the evaluation of the Sub data set Sub2_aEntropy of (1), T (u)_a,A_xM) indicates that the Sub data set Sub2 is to be evaluated_aAddress in (1) according to sub-algorithm A_xThe conditional entropy of the classification (classified into recommendation and non-recommendation) of the recommendation result of (1).

(4.d) calculation as active user u_aRecommendation sub-algorithms A during recommendation_xStability weighted value G of_ax(x is more than or equal to 1 and less than or equal to n), wherein n is the number of the recommended sub-algorithms, and the specific calculation method comprises the following steps:

wherein u is_aRepresenting active users currently enjoying the recommended service, A_x(x is more than or equal to 1 and less than or equal to n) represents a certain recommendation sub-algorithm (n is the number of recommendation sub-algorithms), and M represents a training list TopM_axNumber of addresses in the set, IG (u)_a,A_xM) represents each recommendation sub-algorithm A_x(x is more than or equal to 1 and less than or equal to n).

Fifthly, comprehensively considering the robustness of the integrated recommendation system, balancing the relationship between the recommendation precision and the system stability, and providing active users u on the basis of two groups of weighting coefficients_aCalculating the final total weighting coefficient C_a. Pre-scoring the inaccessible addresses by each recommended submodel according to a total weighting coefficient C_aAnd fusing to generate the final prediction scores of the integration model on the unaccessed addresses. And sorting all the inaccessible addresses according to the final prediction scores, and providing a recommendation list consisting of a plurality of addresses which are ranked at the top for the active users. The specific implementation steps are as follows:

(5.a) calculation as active user u_aRecommendation sub-algorithms A during recommendation_x(1. ltoreq. x. ltoreq.n) final weighting factor C_axAnd n is the number of the recommended sub-algorithms, and the specific calculation method comprises the following steps:

wherein, C_axPresentation recommendation sub-algorithm A_xPre-scored final weight value, W_axDenoted as active user u_aWhen recommending, the recommendation sub-algorithm A_xPre-scored precision weight value, G_axPresentation recommendation sub-algorithm A_xPre-scored stability weight values.

(5.b) for active user u_aLocation i not visited_k(l_k∈NewL_a) And calculating final prediction scores for the positions by an integrated algorithm, wherein the specific calculation method comprises the following steps:

wherein, C_axIs an active user u_aThe final weighting factor at the time of recommendation,

is the recommendation sub-algorithm A_xFor active user u_aLocation i not visited_kPre-scoring of (2).

(5, c) sorting all addresses which are not visited by the active user according to the final prediction score of the integration algorithm, forming a recommendation list by N positions with top ranking, and forming the recommendation list by TopNList_aAnd returning to the active user.

And sixthly, evaluating the robustness of each recommendation system by using the precision index and the stability index, mainly comparing the comprehensive performance of the personalized position recommendation algorithm based on the ensemble learning and each sub-algorithm before the ensemble, and evaluating the applicability and the effectiveness of the proposed technology. The method comprises the following implementation steps:

and (6.a) randomly selecting U multiplied by 10% of users from the target data set as an active user set AU, and operating each recommendation algorithm for each active user in the set to generate a recommendation list.

And (6.b) evaluating the robustness of each recommendation system by using the Precision index and the stability index, wherein the values of Precision, Recall rate Recall, recommendation Precision index F1 and information gain IG of each algorithm running for the active user set AU once are the average value of the indexes of all users in the AU set.

(6.c) repeating steps (6.a) and (6.b) times Ntimes, i.e. all algorithms run Ntimes independently.

And (6.d) setting the values of Precision, Recall rate, recommendation Precision index F1 and information gain IG of the integrated algorithm and each sub-recommendation algorithm provided by the invention to be the average value of the running results of Ntimes.

(6.e) comparative analysis of the results of each index: if the recommendation precision index F1 of the integrated algorithm is larger than the recommendation precision index F1 values of all the sub-recommendation algorithms, the recommendation precision of the integrated algorithm is higher than that of all the sub-algorithms; if the information gain IG index of the integrated algorithm is larger than the maximum value in the information gain IG index of the sub-recommendation algorithm, the integrated algorithm is stable compared with all sub-algorithms; if the above two conclusions are both true, the technology proposed by the invention is more robust.

The following describes how the personalized location recommendation method based on ensemble learning according to the present invention works in detail by taking a specific location-based social network as an example.

Brightkit is a location-based social networking service provider where users share their location by checking in. The social network comprises 58228 users and 693362 positions, and 214078 social relationships are formed among the users. The brightkit dataset, which collects 4491143 check-in information during the 10 th month from 2008 to 2010, has become one of the most commonly used test datasets by recommendation system researchers. The present invention takes the data in the los angeles area in the brightkit data set as an example for instantiation.

The method comprises the following steps of firstly, collecting and sorting an original user check-in data set C, and converting the original user check-in data set C into a user-position scoring matrix R, wherein the specific operation steps are as follows:

(1.a) select the user in the area of los angeles in the example dataset brightkit to check-in dataset C. The data set consists of 61710 historical check-in records of 2951 addresses of 1233 users, 4216 social relationships are formed among the users, the average check-in times of each user is 50.05 times, the average number of check-in times of each user is 6.84 friends, and the average number of visit times of each position is 20.91 times. Each check-in record contains information such as a user ID, an address ID, an access time, an address longitude, an address latitude, and the like.

(1.b) converting each check-in record into a triplet (u)_i,l_j,n_ij) Wherein u is_iIs the ith user (1 ≦ i ≦ 1233), l_jIs the jth item (1. ltoreq. j. ltoreq.2951), n_ijRepresenting user u_iAccess address l_jThe number of times.

(1.d) calculating user u_iTotal number of visited locations NLC_i。

(1.f) calculating the visited location l_jAll users NUC of_j。

wherein r is_ijRepresenting user u_iFor address l_jScore of n, n_ijRepresenting user u_iAt address l_jNumber of check-ins, NC _ au_jIndicating all users are at location l_jTotal number of accesses of, NLC_iRepresenting user u_iAccessTotal number of positions passed, NC _ al_iRepresenting user u_iTotal number of accesses to all locations, NUC_jIndicating visited location l_jOf all users.

(1.h) find the lowest value min of all scores in the user-location scoring matrix R to be 0 and the highest value max of all scores to be 12.61. And (3) carrying out normalization operation on the user scores obtained in the last step:

wherein r is_ijRepresenting user u_iFor address l_jThe score of (1).

After normalization, the user u_iFor the address l_jScore r of_ijIs mapped to [0, 1 ]]In the interval, the value 1 indicates that the user visits the position frequently and likes the position very much; a value of 0 indicates that the user never visited the location, and a higher score value indicates that the user prefers the address.

Summing all scores to form a user-location score matrix R ═ R_ij},i∈[1,1233],j∈[1,2951]Where i denotes a user number and j denotes an address number.

Second, select a certain active user u in LBS N_aAs a recommended service object. Selecting any type and any number of recommendation sub-algorithms A₁,A₂,…,A_nAnd n is the number of the recommended sub-algorithms. And dividing the addresses accessed by the active users into a training subdata set and an evaluation subdata set according to a certain proportion. And calculating pre-scores for the addresses which are not accessed by the active user and the addresses in the evaluation sub-data sets by using the accessed information of the active user training sub-data sets. The operation steps are as follows:

(2.a) obtaining an active user u in a certain los Angeles region in the example dataset BrightKite_aPersonal information, social relationships, historical access records.

(2.b) selecting four recommendation algorithms A₁User-based collaborative filtering algorithm (UBCF), a₂Singular Value Decomposition (SVD), A₃Socialized based collaborative filtering (SCF), a₄Kernel Density Estimation (KDE) is a sub-algorithm of the integration model in the present invention. The reason is as follows: UBCF is a typical representation of a memory-based collaborative filtering algorithm that can mine a user's personal preferences, but does not provide effective recommendations for new items or inactive users, i.e., the so-called cold start problem; SVD is a typical representation of a matrix decomposition technology in a model-based collaborative filtering algorithm, can deal with the cold start problem in UBCF, but has high computational complexity, low running speed and improved recommendation accuracy; considering that the social relationship among users is a main characteristic of LBSN, the SCF algorithm is selected as a supplement of UBCF algorithm, namely the influence of the social relationship on the user behavior mode is considered on the basis of the UBCF algorithm, SCF can obtain more accurate recommendation, but the SCF still has the problems of cold start, single recommendation result type and the like as the UBCF; in consideration of the geographic attribute characteristics of the positions in the LBSN N, the KDE algorithm simulates the influence of the geographic positions on the check-in activity of each user into personalized probability distribution, and the geographic characteristics of the positions in the LBSN N are reasonably mined. In addition, unlike the first three sub-algorithms, the KDE does not need to refer to access information of other users, and is therefore particularly suitable for sparse scoring matrices. The method has the main defects of low recommendation precision and unstable algorithm performance.

From the unique advantages and disadvantages of the four sub-algorithms, the four sub-algorithms selected by the invention complement each other, and the advantages and the disadvantages are complementary.

(2, c) carrying out model training on each algorithm to obtain each recommended sub-model M_UBCF,M_SVD,M_SCF,M_KDE。

(2.d) setting a uniform address division ratio p to 0.4 for all active users, and enabling the active users u_aThe accessed addresses are divided into training Sub data sets Sub1 according to the proportion_aAnd evaluating the Sub data sets Sub2_a。

(2.e) Using recommendation submodels M_UBCF,M_SVD,M_SCF,M_KDEAnd active user u_aTraining subdata set ofSub1_aUBCF, SVD, SCF, KDE algorithms on the set of unaccessed addresses NewL_aAnd evaluating the Sub data sets Sub2_aAddress l in_k(l_k∈NewL_a∪Sub2_a) Calculating pre-scores, respectively

Thirdly, selecting an evaluation index F1 of recommendation precision as a recommendation accuracy evaluation index, and comparing and evaluating the Sub data sets Sub2_aAnd evaluating the recommendation accuracy of each submodel according to the real scoring and pre-scoring information of the address. According to the recommendation precision index F1 value of each recommendation submodel, the active user u is selected_aComputing a set of precision weight values W_aThe method comprises the following implementation steps:

(3.a) collecting recommendation sub-algorithms A_x(x is more than or equal to 1 and less than or equal to 4) is an active user u_aCalculated pre-scoring information

The Sub data set Sub2 will be evaluated_aAll addresses l in_k(l_k∈Sub2_a) Sorting by pre-scoring, taking the address of M-10 before ranking, and assigning to each algorithm A_xGenerating a training list Top10_ax。

(3.b) collecting active users u_aSub2 for evaluating Sub data sets_aWill evaluate the Sub data set Sub2_aPutting the preference sub data set Prefer to the address with the middle real score larger than the goodling ═ 0.05_a。

wherein u is_aRepresenting active users currently enjoying the recommended service, A_x(x is more than or equal to 1 and less than or equal to 4) represents a certain recommended sub-algorithm, a training list TopM_axRepresenting the evaluation of the Sub data set Sub2_aSet of top10 addresses with highest pre-score, preference subdata set preferr_aRepresenting the evaluation of the Sub data set Sub2_aAddress set with median truth score greater than 0.05, M represents training list TopM_axThe number of addresses in the set (M10).

wherein u is_aRepresenting active users currently enjoying the recommended service, A_x(x is more than or equal to 1 and less than or equal to 4) represents a certain recommended sub-algorithm, a training list TopM_axRepresenting the evaluation of the Sub data set Sub2_aThe set of the top10 addresses with the highest pre-score, the preference sub data set Prefer_aRepresents evaluating the subdata sets Sub2_aAddress set with median truth score greater than 0.05, M represents training list TopM_axThe number of addresses in the set (M ═ 10).

wherein u is_aRepresenting active users currently enjoying the recommended service, A_x(1. ltoreq. x.ltoreq.4) represents a certain recommendation sub-algorithm, M represents a training list TopM_axNumber of addresses in set (M ═ 10), precision (u ═ 10)_a,A_xM) indicates the accuracy of each recommended sub-algorithm, call (u)_a,A_xAnd M) represents the recall rate of each recommendation sub-algorithm.

After the four sub-algorithms are run 100 times (each time a group of target users is randomly selected), the frequency histogram based on the recommendation accuracy index F1 on the evaluation data set is shown in fig. 4.

(3.f) calculating as active user u_aEach recommender when recommendingAlgorithm A_xPrecision weight value W of_ax(x is more than or equal to 1 and less than or equal to 4), and the specific calculation method comprises the following steps:

wherein u is_aRepresenting active users currently enjoying the recommended service, A_x(1. ltoreq. x.ltoreq.4) represents a certain recommendation sub-algorithm, M represents a training list TopM_axNumber of addresses in set (M ═ 10), F1 (u)_a,A_xM) represents each recommendation sub-algorithm A_x(1. ltoreq. x. ltoreq.4) of the recommended precision.

And fourthly, selecting an information gain IG as a system stability evaluation index, comparing and evaluating the real score and pre-score information of the address in the subdata set, and evaluating the system stability of UBCF, SVD, SCF and KDE in a non-malicious attack scene. According to the information gain IG value of each recommended sub-model, the active user u is selected_aComputing a set of stability weight values G_aThe method comprises the following implementation steps:

wherein u is_aRepresenting active users currently enjoying the recommendation service, Sub2_aIndicates that user u is to be active_aThe accessed addresses are divided according to a certain proportion to obtain an evaluation subdata set, and preference is given to the subdata set Prefer_aRepresenting the evaluation of the Sub data set Sub2_aAddress sets with median true scores greater than 0.05.

(4.b) compute Sub data set to be evaluated Sub2_aAddress in (1) according to sub-algorithm A_x(x is more than or equal to 1 and less than or equal to 4) conditional entropy when the recommendation results are classified (classified into recommendation and non-recommendation), and the specific calculation method comprises the following steps:

wherein u is_aRepresenting active users currently enjoying the recommended service, A_x(1. ltoreq. x.ltoreq.4) represents a certain recommended Sub-algorithm, Sub2_aIndicates that user u is to be active_aThe accessed addresses are divided according to a certain proportion to obtain an evaluation subdata set and a training list TopM_axRepresenting the evaluation of the Sub data set Sub2_aThe set of the top10 addresses with the highest pre-score, M represents the training list TopM_axNumber of addresses in set (M10), TP_axIs the number of addresses really liked by the user in the recommendation list, FN_axThe number of addresses really liked by the user who is not in the recommendation list (not recommended).

IG(u_a,A_x,M)＝D(u_a)-T(u_a,A_x,M) (21)

wherein u is_aRepresenting active users currently enjoying the recommended service, A_x(1. ltoreq. x.ltoreq.4) represents a certain recommendation sub-algorithm, M represents a training list TopM_axNumber of addresses in set (M ═ 10), D (u)_a) Represents evaluating the subdata sets Sub2_aEntropy of information in (1), T (u)_a,A_xM) indicates that the Sub data set Sub2 is to be evaluated_aAddress in (1) according to sub-algorithm A_xThe conditional entropy of the classification (classified into recommendation and non-recommendation) of the recommendation result of (1).

After running the four Sub-algorithms 100 times (randomly selecting a group of target users at a time), it is based on evaluating the Sub-data sets Sub2_aThe frequency histogram of the above information gain IG index is shown in fig. 5.

(4.d) calculation as active user u_aRecommendation sub-algorithms A during recommendation_xSet of stability weights G_ax(x is more than or equal to 1 and less than or equal to 4), and the specific calculation method comprises the following steps:

wherein u is_aRepresenting active users currently enjoying the recommended service, A_x(1. ltoreq. x.ltoreq.4) represents a certain recommendation sub-algorithm, M represents a training list TopM_axNumber of addresses in set (M: 10), IG (u)_a,A_xM) represents each of the recommendation sub-algorithms A_x(x is more than or equal to 1 and less than or equal to 4).

(5.a) calculation as active user u_aRecommendation sub-algorithms A during recommendation_x(1. ltoreq. x. ltoreq.4) final weighting factor C_axThe specific calculation method comprises the following steps:

(5, c) sorting all addresses which are not visited by the active user according to the final prediction score of the integration algorithm, forming a recommendation list by N positions with top ranking, and forming the recommendation list by TopNList_aAnd returning to the active users (N can be a multiple of 5, and N is more than or equal to 5 and less than or equal to 50 under the normal condition).

And sixthly, evaluating the robustness of each recommendation system by using the precision index and the stability index, mainly comparing the comprehensive performance of the personalized position recommendation algorithm based on the ensemble learning and the integrated first four sub-algorithms, and evaluating the applicability and the effectiveness of the proposed technology. The realization steps are as follows:

and (6.a) randomly selecting 123 users from the target data set as an active user set AU, and operating an integrated recommendation algorithm and four sub-algorithms for each active user in the set to generate a recommendation list.

And (6.b) evaluating the robustness of each recommendation system by using the Precision index and the stability index, wherein the values of Precision, Recall, recommendation Precision index F1 and information gain IG of each algorithm which runs for the active user set AU once are the average value of the indexes of all users in the AU set.

(6.c) repeat steps (6.a) and (6.b) 100 times, i.e. all algorithms run 100 times independently.

Box-shaped graphs of 100 accuracy rates Precision, Recall rate Recall and recommendation accuracy index F1 generated in the process of 100 running of the integrated model provided by the invention are respectively shown in FIG. 6, FIG. 7 and FIG. 8.

And (6.d) setting the values of Precision, Recall, recommendation Precision index F1 and information gain IG of the integrated algorithm and the four sub-recommendation algorithms provided by the invention to be the average value of 100 running results. When N takes different values, the accuracy Precision, Recall, recommendation Precision index F1, and information gain IG results of each recommendation algorithm are shown in tables 2, 3, 4, and 5, respectively:

TABLE 2 accuracy Precision index values for different recommendation algorithms

TABLE 3 Recall ratio Recall index values for different recommendation algorithms

TABLE 4 recommendation accuracy F1 index values for different recommendation algorithms

TABLE 5 information gain IG index values for different recommendation algorithms

In this case, a histogram comparing the integrated model with the recommended accuracy index F1 for each submodel is shown in fig. 9, and a histogram comparing the information gain IG index is shown in fig. 10.

(6.e) comparing and analyzing the results of each index: the Precision, Recall rate and recommendation Precision index F1 of the integrated algorithm are all larger than the corresponding index values of all the sub-recommendation algorithms, and the recommendation Precision of the integrated algorithm is higher than that of all the sub-algorithms; the information gain IG index of the integrated algorithm is larger than the maximum value in the information gain IG indexes of the sub-recommendation algorithms, which shows that the integrated algorithm is more stable than all the sub-algorithms; the above two conclusions illustrate the robustness of the proposed technique.

The method is different from a conventional integrated algorithm, aims to construct a position recommendation system with strong expandability, high recommendation precision and stable recommendation result, considers the accuracy of the recommendation result and the diversity characteristics of data utilization and user behaviors, innovatively provides an evaluation mode using information gain as a system stability index, quantifies uncertainty caused by data source limitation (such as sparsity and cold start), different data preprocessing modes and model training, and improves the stability of the output result of the recommendation system. In addition, a set of weighting coefficients is customized for each user, and the integration algorithm can be ensured to be biased to different sub-algorithms for different users in a personalized weighting mode. The technology provided by the invention is beneficial to improving the robustness of the recommendation system, enhancing the service quality of the recommendation system, having wide application prospect and being expected to be widely applied to the social network market based on the position.

The above-described process flow is only a preferred embodiment of the present invention, but does not represent all the details of the present invention. Any modification, equivalent replacement, and improvement made by those skilled in the art within the technical scope of the present disclosure within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims

1.A personalized position recommendation method based on ensemble learning is characterized by comprising the following steps:

step 1, collecting and sorting an original user sign-in data set C, and converting the original user sign-in data set C into a user-position scoring matrix R;

step 2, selecting a certain active user u in the location-based social network LBSN N_aAs a recommendation service object, selecting any type and any number of recommendation sub-algorithms A₁,A₂,…,A_nDividing the address visited by the active user into a training subdata set and an evaluation subdata set, and using the visited information of the training subdata set of the active user, and recommending the sub-algorithms to the address and the evaluation subdata which are not visited by the active userCentralized address calculation pre-scoring;

step 3, selecting an evaluation index F1 of recommendation precision as a recommendation precision evaluation index, comparing real scores and pre-score information of addresses in the evaluation sub-data sets, evaluating the recommendation precision of each recommendation sub-algorithm, and determining the active user u according to the recommendation precision index F1 value of each recommendation sub-algorithm_aComputing a set of precision weight values W_a；

Step 4, selecting information gain IG as a system stability evaluation index, comparing and evaluating the real score and pre-score information of the address in the sub data set, evaluating the system stability of each recommendation sub algorithm in a non-malicious attack scene, and providing an active user u according to the information gain IG value of each recommendation sub algorithm_aComputing a set of stability weight values G_a；

Step 5, in the precision weight value set W_aAnd stability weight value set G_aBased on active users u_aCalculating the final total weighting coefficient C_aPre-scoring the inaccessible addresses by the recommended submodels by a total weighting factor C_aMerging, generating a final prediction score of the integrated model for the inaccessible addresses, sequencing all the inaccessible addresses according to the final prediction score, and providing a recommendation list consisting of a plurality of addresses ranked at the top for active users;

and 6, comparing the comprehensive performance of the personalized position recommendation algorithm based on ensemble learning and each sub-algorithm before integration, which are provided by the method, and evaluating the applicability and effectiveness of the method.

2. The method for personalized position recommendation based on ensemble learning according to claim 1, wherein step 1 of the method comprises:

step 11: selecting a user check-in data set C of a target recommendation system, wherein the data set is composed of historical check-in records of U users for L addresses, and extracting user ID, address ID, access time, address longitude and address latitude information from each check-in record;

step 12: converting each check-in record to a triplet (u)_i,l_j,n_ij) Wherein u is_iIs the ith user, i is more than or equal to 1 and less than or equal to U, l_jIs the jth item, j is more than or equal to 1 and less than or equal to L, n_ijRepresenting user u_iAccess address l_jThe number of times of (c);

step 13: calculate all users at location l_jTotal number of accesses NC _ au_j；

Step 14: calculating user u_iTotal number of visited locations NLC_i；

Step 15: computing user u_iTotal number of accesses to all locations NC _ al_i；

Step 16: calculating visited location l_jAll users NUC of_j；

And step 17: user u_iAt address l_jNumber of check-ins n_ijConversion to user u_iFor address l_jScore r of_ijThe specific method comprises the following steps:

wherein r is_ijRepresenting user u_iFor the address l_jScore of n, n_ijRepresenting user u_iAt address l_jNumber of check-ins, NC _ au_jIndicating all users are at location l_jL represents the total number of addresses, NLC_iRepresenting user u_iTotal number of accessed positions, NC _ al_iRepresenting user u_iTotal number of visits to all locations, U representing total number of users, NUC_jIndicating visited location l_jThe number of all users of (c);

step 18: the user score is normalized, and the specific calculation method comprises the following steps:

wherein r is_ijRepresenting user u_iTo the groundAddress l_jMin represents the lowest value of all scores in the user-position scoring matrix R, and max represents the highest value of all scores;

summing all scores to form a user-location score matrix R ═ R_ij},i∈[1,U],j∈[1,L]。

3. The method for personalized position recommendation based on ensemble learning according to claim 1, wherein step 2 of the method comprises:

step 21: obtaining a certain active user u of the current service of the recommendation system_aThe information of (a);

step 22: selecting a group of recommendation algorithms A according to application scenes and data characteristics₁,A₂,…,A_nA sub-algorithm as an integration model;

step 23: performing model training on each algorithm according to the operation mechanism of each recommendation sub-algorithm to obtain each recommendation sub-algorithm M₁,M₂,…,M_n；

Step 24: setting a uniform address division ratio p for all active users, and dividing the active users u_aThe accessed addresses are divided into Sub data sets Sub1 according to the proportion_aAnd Sub data set Sub2_a；

Step 25: using recommendation sub-algorithms M₁,M₂,…,M_nAnd active user u_aSub1_aSubdata set information, NewL for set of unaccessed addresses_aAnd Sub data set Sub2_aAddress l in_k，l_k∈NewL_a∪Sub2_aCalculate a pre-score, as

4. The ensemble learning-based personalized location recommendation method according to claim 1, wherein step 5 of the method comprises:

step 51: calculated as active user u_aRecommendation sub-algorithms A during recommendation_xAnd x is more than or equal to 1 and less than or equal to n, and the specific calculation method comprises the following steps:

wherein, C_axIs an active user u_aFinal weighting factor at recommendation, W_axDenoted as active user u_aWhen recommending, the recommendation sub-algorithm A_xPre-scored precision weight value, G_axPresentation recommendation sub-algorithm A_xA pre-scored stability weight value;

step 52: for active user u_aLocation i not visited_k，l_k∈NewL_aAnd calculating final prediction scores for the positions by an integrated algorithm, wherein the specific calculation method comprises the following steps:

is the recommendation sub-algorithm A_xFor active user u_aLocation i not visited_kPre-scoring;

step 53: to set NewL_aAll the addresses in the method are sorted according to the final prediction score of an integration algorithm, N positions with the top rank form a recommendation list, and the recommendation list is TopNList_aAnd returning to the active user.

5. The method for recommending personalized positions based on ensemble learning according to claim 1, wherein said step 6 comprises:

step 61: randomly selecting Ux 10% of users from a target data set as an active user set AU, and operating each recommendation algorithm for each active user in the set to generate a recommendation list;

step 62: evaluating the robustness of each recommendation system by using the Precision index and the stability index, wherein the values of Precision indexes Precision, Recall, F1 and stability index IG of each algorithm which runs once for an active user set AU are the average value of the indexes of all users in the AU set;

and step 63: repeating the steps 61 and 62 Ntimes, namely independently running all algorithms for Ntimes;

step 64: setting the values of Precision, Recall, F1 and IG of the integration algorithm and each sub-recommendation algorithm as the average value of Ntimes running results;

step 65: and comparing and analyzing the results of all indexes: if the F1 value of the integration algorithm is larger than the F1 values of all the sub-recommendation algorithms, the recommendation precision of the integration algorithm is higher than that of all the sub-algorithms; if the IG index of the integrated algorithm is larger than the maximum value in the IG indexes of the sub-recommendation algorithms, the integrated algorithm is stable compared with all the sub-algorithms; if the two conclusions are established, the robustness of the integrated algorithm is stronger.

6. The personalized position recommendation method based on ensemble learning according to claim 1, wherein the method divides addresses visited by an active user into a training subdata set and an evaluation subdata set according to a certain proportion, selects a plurality of recommendation sub-algorithms of any type, utilizes historical score information in the training subdata set of the active user, calculates pre-scores of other addresses for the active user by each sub-algorithm, compares the historical scores and the pre-score information of the addresses in the evaluation subdata set, carries out accuracy evaluation and stability evaluation on each sub-algorithm, generates a personalized weighting coefficient for the active user according to an evaluation result, combines the pre-scores of the non-visited addresses by each sub-algorithm by using the weighting coefficient, generates a final prediction score of the non-visited addresses of the active user by an ensemble model, and ranks the prediction scores of all the non-visited addresses, and selecting a plurality of addresses with the top ranking to recommend to the active users.