CN111475744B - Personalized position recommendation method based on ensemble learning - Google Patents

Personalized position recommendation method based on ensemble learning Download PDF

Info

Publication number
CN111475744B
CN111475744B CN202010257793.0A CN202010257793A CN111475744B CN 111475744 B CN111475744 B CN 111475744B CN 202010257793 A CN202010257793 A CN 202010257793A CN 111475744 B CN111475744 B CN 111475744B
Authority
CN
China
Prior art keywords
recommendation
sub
user
algorithm
addresses
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010257793.0A
Other languages
Chinese (zh)
Other versions
CN111475744A (en
Inventor
朱俊
韩立新
勾智楠
杨忆
袁晓峰
李树
李景仙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN202010257793.0A priority Critical patent/CN111475744B/en
Publication of CN111475744A publication Critical patent/CN111475744A/en
Application granted granted Critical
Publication of CN111475744B publication Critical patent/CN111475744B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Business, Economics & Management (AREA)
  • Software Systems (AREA)
  • Human Resources & Organizations (AREA)
  • General Business, Economics & Management (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a personalized position recommendation method based on ensemble learning, which comprises the following steps: firstly, converting a check-in data set into a scoring matrix; and secondly, selecting a plurality of recommendation sub-algorithms, and dividing the address accessed by the active user into a training sub-data set and an evaluation sub-data set. Utilizing the training sub data set, and calculating pre-scores for the addresses in the evaluation data set and the addresses which are not visited by each sub algorithm; thirdly, calculating the recommendation precision F1 of each sub-model by using the evaluation data set to generate a precision weight value set; selecting information gain IG as a stability index, evaluating the stability of each sub-model, and calculating a stability weight value set; and fifthly, calculating a final total weighting coefficient for the active users. The integration model fuses pre-scores of the inaccessible addresses by each sub-algorithm according to a total weighting coefficient to generate a final prediction score; and sixthly, evaluating the comprehensive performance of the method and each sub-algorithm before integration, and evaluating the effectiveness of the method.

Description

Personalized position recommendation method based on ensemble learning
Technical Field
The invention relates to an integrated learning-based personalized position recommendation method in a social network, and belongs to the technical field of artificial intelligence and machine learning.
Background
Location-based Social Networks (lbs ns) are products of gradual merging and development of Online Social Networks (Online Social Networks) and Location-based services (Location-based services), and provide a platform for close connection between an Online virtual Network and an offline real world. In recent years, with the widespread popularity of mobile devices and the rapid development of location technologies, a large array of location-based social networks has rapidly emerged. In the lbs ns, complex social relationships, such as friendships, coworkers, relatives, and the like, may be established between users. The user can also use the added geographic information in the social network to view points-of-interest (POIs), such as restaurants, shops, movie theaters and the like, check in by using the mobile device when visiting the points of interest, publish the geographic position information of the points of interest, and share the suggestions and comments of the points of interest. LBSSNs help merchants further learn about the real users behind the network, thereby "making good" the customization of personalized services to different users that meet their needs.
As the number of users registered in the lbs ns increases, the lbs ns store and accumulate abundant available information, and the user cannot quickly and effectively find the information required by the user within a limited time due to the massive information. Therefore, recommendation systems that address the "information overload" problem are receiving increasing attention from researchers. For example, the famous Amazon company uses a recommendation system to recommend commodities to users, so that the click rate and the turnover are improved for merchants; the movie recommendation website Netflix attracts a plurality of research teams to focus on research for improving recommendation accuracy by holding a recommendation system contest. As a special information filtering system, the recommending system does not need the user to actively provide determined keyword information, but models the interests and hobbies of the user by analyzing the existing historical behaviors of the user, mines the potential preference of the user, and then actively recommends commodities, services and the like meeting the requirements of the user. Based on a large amount of user information, friend information and position information, researchers face lbs ns to realize applications such as friend recommendation, expert discovery, position recommendation, activity recommendation, path recommendation and the like. Among them, the research of location recommendation is a research hotspot in this field at present.
The recommendation algorithm is a main technical composition of the recommendation system, and the operation efficiency of the recommendation system and the accuracy of the recommendation result are determined to a great extent by the efficiency of the algorithm. Depending on the design strategy, the recommendation algorithms mainly include collaborative filtering algorithms, content-based recommendation algorithms, and hybrid recommendation algorithms, wherein the collaborative filtering algorithms in turn include memory-based collaborative filtering algorithms (e.g., user-based collaborative filtering (UBCF), project-based collaborative filtering (IBCF)) and model-based collaborative filtering algorithms (e.g., Singular Value Decomposition (SVD), clustering models, Probabilistic Latent Semantic Analysis (PLSA)). In content-based location recommendations, a number of characteristics such as tags, categories, and user comments may be extracted from the location. The user's preferences are extracted from the user's profile and then matched against the location profile to obtain accurate recommendations. The UBCF algorithm converts the sign-in behavior of the user into a user-position scoring matrix, finds similar users of the current active user by using the information of the data set, predicts the scoring of the active user on the non-sign-in places by using the interest preference of the users, and recommends the position with the highest predicted scoring to the current user. The IBCF algorithm is based on the assumption that: the user always prefers an address that is highly similar to the item he previously liked. The IBCF algorithm therefore first calculates the similarity between locations and recommends to the active user the address that most closely resembles the user's POIs (highest predicted score). The SVD algorithm is a classical representation of matrix decomposition, whose main task is to generate low rank approximations. The low-dimensional orthogonal matrix decomposed by the SVD algorithm reduces noise on the basis of the original matrix, and can more effectively reveal potential association of users and commodities. In the SVD algorithm, some common characteristics exist among items, a user likes an item because the user scores the characteristics higher, and by decomposing the score of the user into the characteristics by a linear algebra method, the preference of the user for an inaccessible address can be predicted according to the preference degree of the user for the characteristics.
The above conventional recommendation techniques neither take into account the geographic characteristic impact of the location nor take advantage of social relationships between users. However, each location in the location recommendation system has geographic features identified by latitude and longitude, and the geographic features of POIs can have a significant impact on the user's access preferences. In addition, the social relationship of the user also affects the check-in behavior of the user, and when the user does not determine the place where the user wants to go, the user often refers to the historical access records of friends in the social network. Therefore, when designing a location recommendation algorithm, it is necessary to take the factors in the aspect of the situation into consideration, mine the geographic features of the location, and utilize the social relationship between users.
Social-based Collaborative Filtering (SCF) is a recommended method that considers both the personal preferences and Social relationships of users, and is based on the assumption that friends all have the same interest preferences and are easily influenced by each other, and active users are more willing to make a decision for themselves through the experience of friends. In SCF, only the preferences of friends of the active user need to be considered when calculating the predictive rating of an address. In calculating the similarity between an active user and his friends, the history scores of both visited places can be used, the geographic distance between the user's residence can be used, or the similarity of the intersection and check-in history of their friendship networks can be considered. In addition, some research is dedicated to mining the geographic features of the location, and some techniques use matrix factorization, and more algorithms simulate the geographic influence through a common probability distribution, such as power law distribution, multi-center gaussian distribution, Kernel Density Estimation (KDE), and so on. When the KDE is used for predicting the probability of the user accessing the new position, the influence of the geographical position on the check-in activity of each user is personalized, and a more expressive geographical perception recommendation system is constructed.
However, most of the current position recommendation technologies are single algorithm models, and all the models are based on certain theoretical assumptions, so that each algorithm has inherent defects and can only play excellent roles in a specific application scenario. For example, content-based location recommendation is suitable for dealing with cold start problems, but it requires a large amount of structural information for users and locations, which increases the storage and computation costs of the system; the UBCF and IBCF algorithms only consider the neighborhood effect in the rating data, so that although the user preference is mined, the characteristics of the item content are ignored, and the diversity of the recommendation result is limited; the SVD algorithm has high computational complexity and low running speed, and the recommendation accuracy is still to be improved. To overcome the limitations of a single algorithm, some researchers have focused on how to combine a small number of several scoring prediction methods into a single overall model. Ensemble learning is just an effective means to solve this problem. Ensemble learning is a new machine learning paradigm that can effectively improve the generalization of learning systems by using multiple weak learners to solve the same problem. The authoritative Dietterich in the international field of machine learning has pointed out that ensemble learning is the first of four major research directions for machine learning (ensemble learning, symbolic learning, statistical learning, and reinforcement learning). Ensemble learning can exceed a single learning algorithm in several respects: the method has better average performance in different fields and data sets; a combined solution which cannot be obtained by any single learning algorithm can be found; the variation of the sampling is less sensitive to noise and outliers; solutions can be obtained by combining multiple distributed data sources or multiple characteristics of data sources, and the resultant fusion of multiple data sources or multiple characteristics of data sources is becoming increasingly important in distributed data mining. The effectiveness of the ensemble learning technology enables the ensemble learning technology to be widely applied to a plurality of fields such as biological feature recognition, computer-aided medical diagnosis, text recognition, Web information filtering and the like.
At present, some recommendation systems apply ensemble learning to personalized recommendation problems, and effectiveness and adaptability of information recommendation are improved. However, relevant research proves that the existing recommendation system based on ensemble learning still has many defects and shortcomings, which summarize the following points:
(1) in the fusion process, the number of considered sub-algorithms is fixed, the types of the considered sub-algorithms are limited, and the expandability of the integrated model is not strong. At present, most popular position recommendation systems based on ensemble learning only consider the fusion of two algorithms, and sub-algorithms are generally a certain collaborative filtering algorithm or position access probability estimation, so that the improvement range of the application scene and the system performance is limited to a certain extent. The existing integration framework cannot support the fusion of any number and any kind of recommendation sub-algorithms.
(2) The integrated algorithm needs to set some weighting coefficients to fuse the prediction results of the sub-algorithms into a final prediction score, and the weighting coefficients represent the importance of the sub-algorithms. The fusion rule of existing algorithms is usually addition or multiplication or other simple linear combination, whose weighting coefficients are consistent for all users. However, in the real world, since the characteristics of each user and each item are different, the optimal sub-algorithms are not consistent for different users, that is, the algorithms most capable of mining and reflecting the user interests in the sub-algorithms are different from person to person. It follows that it is necessary to tailor a set of weighting coefficients for each user, ensuring that the integration algorithm can "bias" different sub-algorithms for different users by way of personalized weighting.
(3) In order to enhance the user experience, a good recommendation system should have the feature of robustness. The robustness of the recommendation system contains two indispensable factors of accuracy and stability. However, most of the current research is focused on only one of these aspects. In fact, the accuracy of the prediction determines whether the user likes the recommended location, and the stability of the system reflects whether the recommendation system can produce consistent recommendations in various application scenarios. Ignoring any of these aspects can affect the user's stickiness and reduce the profits of the service provider.
(4) At present, few stability studies almost limit application scenarios to malicious attacks, for example, an attacker tries to recommend a preset item to a user. However, in addition to malicious attacks, the uncertainty caused by data source limitations (such as sparsity and cold start), different data preprocessing modes and model training can also cause inconsistency of recommendation results, and affect the stability of the system. But the system stability research under the non-malicious attack scene is almost blank.
The above-mentioned disadvantages of the existing recommendation system technology based on ensemble learning bring about major disadvantages in the design, development, deployment and operation of different e-commerce platforms, and especially cause the service quality of the recommendation system to be reduced on the network platform of massive project information, thereby affecting the sales performance of the e-commerce system.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides an integrated learning-based personalized position recommendation method aiming at constructing a position recommendation system with strong expandability, high recommendation precision and stable recommendation result, and systematically provides a technical flow scheme of an integrated recommendation algorithm. Meanwhile, the system theory is taken as a theoretical basis, the robustness evaluation system is taken as a necessary component of the recommendation system, the accuracy of the recommendation result is considered, the diversity characteristics of data utilization and user behaviors under non-malicious attack are also considered, an evaluation mode using information gain as a system stability index is innovatively provided, the uncertainty caused by data source limitation (such as sparsity and cold start), different data preprocessing modes and model training is quantized, and the stability of the output result of the recommendation system is improved. In addition, in the integrated model, personalized weighting is carried out on each sub-model, an integrated recommendation algorithm which best meets the interest characteristics of the user is customized for the user, and the service quality of the recommendation system is further enhanced.
The technical scheme adopted by the invention for solving the technical problems is as follows: the invention divides the address accessed by the active user into a training subdata set and an evaluation subdata set according to a certain proportion. Selecting a plurality of recommendation sub-algorithms of any type, and calculating pre-scores of other addresses for the active users by using historical score information in the active user training sub-data sets by each sub-algorithm. And comparing and evaluating historical scores and pre-score information of the addresses in the sub-data sets, carrying out accuracy evaluation and stability evaluation on each sub-algorithm, and generating personalized weighting coefficients for active users according to evaluation results. Combining the pre-scores of the sub-algorithms on the inaccessible addresses by using weighting coefficients to generate a final prediction score of the integrated model on the inaccessible addresses of the active users, sequencing the prediction scores of all the inaccessible addresses, and selecting a plurality of addresses ranked at the top to recommend to the active users (as shown in fig. 1).
The specific process of the method comprises the following steps:
step 1, collecting and sorting an original user sign-in data set C, and converting the original user sign-in data set C into a user-position scoring matrix R.
Step 2, selecting a certain active user u in the location-based social network LBSN NaAs a recommended service object. Selecting any type and any number of recommendation sub-algorithms A1,A2,…,An. Will be activeThe addresses accessed by the user are divided into a training subdata set and an evaluation subdata set according to a certain proportion. And calculating pre-scores for the addresses which are not accessed by the active user and the addresses in the evaluation sub-data sets by using the accessed information of the active user training sub-data sets.
And 3, selecting the evaluation index F1 of the recommendation accuracy as a recommendation accuracy evaluation index, and comparing the real score and the pre-score information of the address in the evaluation sub-data set to evaluate the recommendation accuracy of each sub-model. According to the recommendation precision index F1 value of each recommendation submodel, the active user u is selectedaComputing a set of precision weight values Wa
And 4, selecting the information gain IG as a system stability evaluation index, comparing the real score and the pre-score information of the address in the evaluation sub data set, and evaluating the system stability of each sub model in a non-malicious attack scene. According to the information gain IG value of each recommended sub-model, the active user u is selectedaComputing a set of stability weight values Ga
Step 5, comprehensively considering the robustness of the integrated recommendation system, balancing the relationship between the recommendation precision and the system stability, and providing the active users u on the basis of two groups of weighting coefficientsaCalculating the final total weighting coefficient Ca. Pre-scoring the inaccessible addresses by each recommended submodel according to a total weighting coefficient CaAnd fusing to generate the final prediction scores of the integration model on the unaccessed addresses. And sorting all the inaccessible addresses according to the final prediction scores, and providing a recommendation list consisting of a plurality of addresses which are ranked at the top for the active users.
And 6, evaluating the robustness of each recommendation system by using the precision index and the stability index, mainly comparing the comprehensive performance of the personalized position recommendation algorithm based on the ensemble learning and each sub-algorithm before the ensemble, and evaluating the applicability and the effectiveness of the proposed technology.
Has the advantages that:
1. the invention has strong expandability and supports the fusion of any type and any number of recommendation sub-algorithms. In practical application, the method and the system can select a proper recommendation sub-algorithm according to different application scenes and different data characteristics, obtain higher recommendation quality on the basis of any one existing algorithm, improve the user stickiness in the location-based social network, and help merchants accurately push advertisements for the users, so that more potential consumers are attracted.
2. According to the invention, a group of weight coefficients are customized for each active user by analyzing different behavior characteristics of each user, and the integrated algorithm can be ensured to be the sub-algorithm which can most mine the interest of different users according to the 'bias' of the different users in a personalized weighting mode. The integration mode of 'customized according to different persons' greatly improves the use satisfaction degree of users to the social network platform, is also beneficial to solving other machine learning problems, and has very important significance to practical application.
3. In the fusion process, the recommendation precision index F1 value integrating the accuracy and the recall rate is selected as the evaluation index of the recommendation accuracy, so that the integrated model is better than each sub-model in the recommendation accuracy, the preference degree of the recommendation result to the user is ensured, and the aim of improving the prediction accuracy of the recommendation algorithm is fulfilled.
4. The method innovatively uses the information gain as the evaluation index of the stability of the recommendation system, fully considers the uncertainty caused by data source limitation (such as sparsity and cold start), different data preprocessing modes and model training, can measure a plurality of factors causing the instability of the system, and ensures the system stability of the recommendation system in a non-malicious attack scene.
5. The method comprehensively considers the prediction accuracy and the system stability, and robustly improves the service quality of the recommendation system. The method has certain universality and portability, can be applied to a position recommendation system, is also suitable for the personalized recommendation field of other traditional projects, and has wide industrial application prospects.
6. The method aims at constructing the position recommendation system with strong expandability, high recommendation precision and stable recommendation result, considers the diversity characteristics of data utilization and user behaviors while considering the accuracy of the recommendation result, innovatively provides an evaluation mode using information gain as a system stability index, quantifies uncertainty caused by data source limitation (such as sparsity and cold start), different data preprocessing modes and model training, and well promotes the stability of the output result of the recommendation system.
Drawings
Fig. 1 is a flowchart of a personalized position recommendation method based on ensemble learning according to the present invention.
Fig. 2 is a flowchart of specific steps of the personalized position recommendation method based on ensemble learning according to the present invention.
FIG. 3 is a flow chart of the steps of the present invention for converting raw user check-in records to a user-location scoring matrix.
FIG. 4 is a frequency histogram of recommendation accuracy indicators F1 on the evaluation sub data set after each recommendation sub-algorithm has been run 100 times (each time a group of target users is randomly selected) in an embodiment of the present invention.
Fig. 5 is a frequency histogram of each recommended sub-algorithm after 100 runs (randomly selecting a group of target users each time) in an embodiment of the present invention based on evaluating the information gain IG on the sub-data set.
FIG. 6 is a box plot of the accuracy of the integrated model after 100 runs in an embodiment of the present invention.
FIG. 7 is a box plot of the recall after 100 runs of the integrated model in an embodiment of the present invention.
FIG. 8 is a box diagram of the recommended accuracy index F1 after 100 runs of the integration model in an embodiment of the invention.
FIG. 9 is a histogram comparing the recommended accuracy index F1 for each sub model with the integrated model in the embodiment of the present invention.
FIG. 10 is a histogram comparing the integrated model with the information gain IG of each sub-model in an embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the attached drawing figures and specific examples.
The specific flow of the design and implementation of the invention is shown in fig. 2, and the main variables and parameters in the process are shown in table 1.
TABLE 1 Functions of the principal variables and parameters
Figure BDA0002438068960000061
Figure BDA0002438068960000071
The method comprises the following steps of firstly, collecting and sorting an original user check-in data set C, and converting the original user check-in data set C into a user-position scoring matrix R, wherein the specific flow is shown in FIG. 3, and the operation steps are as follows:
(1.a) selecting a user check-in dataset C of the target recommendation system. The data set consists of historical check-in records of L addresses by U users, and information such as user IDs, address IDs, access times, address longitudes, address latitudes and the like is extracted from each check-in record.
(1.b) converting each check-in record into a triplet (u)i,lj,nij) Wherein u isiIs the ith user (1 ≦ i ≦ U), ljIs the jth item (1. ltoreq. j. ltoreq.L), nijRepresenting user uiAccess address ljThe number of times.
(1.c) calculating the location l of all usersjTotal number of accesses NC _ auj
(1.d) calculating user uiTotal number of visited locations NLCi
(1.e) calculating user uiTotal number of accesses to all locations NC _ ali
(1.f) calculating the visited location ljAll users NUC ofj
(1.g) user uiAt address ljNumber of check-ins nijConversion to user uiFor address ljScore r ofijThe specific method comprises the following steps:
Figure BDA0002438068960000081
wherein r isijRepresenting user uiFor address ljScore of n, nijRepresenting user uiAt address ljNumber of check-ins, NC _ aujIndicating all users are at location ljL represents the total number of addresses, NLCiRepresenting user uiTotal number of accessed positions, NC _ aliRepresenting user uiTotal number of visits to all locations, U representing total number of users, NUCjIndicating visited location ljOf all users.
(1, h) carrying out normalization operation on the user scores, wherein the specific calculation method comprises the following steps:
Figure BDA0002438068960000082
wherein r isijRepresenting user uiFor address ljMin represents the lowest value of all scores in the user-location score matrix R, and max represents the highest value of all scores. After normalization, the user uiFor address ljScore r ofijIs mapped to [0, 1 ]]In the interval, the value 1 indicates that the user visits the position frequently and likes the position very much; a value of 0 indicates that the user never visited the location, and a higher score value indicates that the user prefers the address.
Summing all scores to form a user-location score matrix R ═ Rij},i∈[1,U],j∈[1,L]Where i denotes a user number, j denotes an address number, U denotes a total number of users, L denotes a total number of addresses, rijRepresenting user uiFor address ljThe score of (1).
Second, select a certain active user u in LBS NaAs a recommended service object. Selecting any type and any number of recommendation sub-algorithms A1,A2,…,AnAnd n is the number of the recommended sub-algorithms. And dividing the addresses accessed by the active users into a training subdata set and an evaluation subdata set according to a certain proportion. Using active user training subdata setsThe accessed information, each recommendation sub-model calculates pre-scores for addresses that the active user has not accessed and addresses in the evaluation sub-data set. The operation steps are as follows:
(2.a) obtaining a certain active user u currently served by the recommendation systemaThe information of (1).
(2.b) selecting a set of recommendation algorithms A according to the application scene and the data characteristics1,A2,…,An(n is the number of recommended sub-algorithms) as the sub-algorithms of the integration model in the invention, for example, a collaborative filtering algorithm (UBCF) based on a user, a collaborative filtering algorithm (IBCF) based on a project, a collaborative filtering (SCF) based on socialization, a Kernel Density Estimation (KDE), a Singular Value Decomposition (SVD), other existing integration algorithms and the like can be selected.
(2, c) carrying out model training on each algorithm according to the operation mechanism of each recommended sub-algorithm to obtain each recommended sub-model M1,M2,…,Mn(n is the number of recommended sub-algorithms).
(2, d) setting a uniform address division ratio p for all active users, and dividing the active users uaThe accessed addresses are divided into training Sub data sets Sub1 according to the proportionaAnd evaluating the Sub data sets Sub2a
(2.e) Using recommendation submodels M1,M2,…,Mn(n is the number of recommendation sub-algorithms) and active users uaSub1 training data setaFor the set of unaccessed addresses NewLaAnd evaluating the Sub data sets Sub2aAddress l ink(lk∈NewLa∪Sub2a) Calculate a pre-score, as
Figure BDA0002438068960000091
Thirdly, selecting an evaluation index F1 of recommendation precision as a recommendation accuracy evaluation index, and comparing and evaluating the Sub data sets Sub2aAnd evaluating the recommendation accuracy of each submodel according to the real scoring and pre-scoring information of the address. According to the recommendation precision index F1 value of each recommendation submodel, the active user u is selectedaCalculating precision weighted valueSet WaThe method comprises the following implementation steps:
(3.a) using A as each recommendation sub-algorithm in the second stepxAnd (x is more than or equal to 1 and less than or equal to n) represents that n is the number of the recommended sub-algorithms. Collecting each sub-algorithm as active user uaThe computed pre-scoring information will evaluate the Sub data sets Sub2aAll addresses in the training list are sorted according to pre-scores, and the address of M before the ranking is taken to generate a training list TopMaxAnd (4) collecting.
(3.b) collecting active users uaSub2 for evaluating Sub data setsaWill evaluate the Sub data set Sub2aPutting addresses with middle real scores larger than goodling into a set preference sub data set Prefera
(3.c) calculating each recommendation sub-algorithm AxThe accuracy Precision of Precision is calculated by the following specific method:
Figure BDA0002438068960000092
wherein u isaRepresenting active users currently enjoying the recommended service, Ax(x is more than or equal to 1 and less than or equal to n) represents a certain recommendation sub-algorithm (n is the number of the recommendation sub-algorithms), and a training list TopMaxRepresenting the evaluation of the Sub data set Sub2aThe set of the first M addresses with the highest pre-score, the preference subdata set PreferaRepresenting the evaluation of the Sub data set Sub2aThe middle real score is larger than the address set of goodling, M represents the training list TopMaxNumber of addresses in the set.
(3.d) calculating recommendation sub-algorithms AxThe Recall rate Recall comprises the following specific calculation method:
Figure BDA0002438068960000093
wherein u isaRepresenting active users currently enjoying the recommended service, Ax(x is more than or equal to 1 and less than or equal to n) represents a certain recommendation sub-algorithm (n is the number of the recommendation sub-algorithms), and a training list TopMaxRepresenting evaluation subdata setsSub2aThe set of the top M addresses with the highest pre-score, the preference sub data set PreferaRepresenting the evaluation of the Sub data set Sub2aThe middle real score is larger than the address set of goodling, M represents the training list TopMaxNumber of addresses in the set.
(3.e) calculating recommendation sub-algorithms AxThe specific calculation method of the comprehensive accuracy index F1 is as follows:
Figure BDA0002438068960000101
wherein u isaRepresenting active users currently enjoying the recommended service, Ax(x is more than or equal to 1 and less than or equal to n) represents a certain recommendation sub-algorithm (n is the number of the recommendation sub-algorithms), and M represents a training list TopMaxNumber of addresses in the set, precision (u)a,AxM) represents each recommendation sub-algorithm Ax(x is not less than 1 and not more than n) accuracy, called (u)a,AxM) represents each of the recommendation sub-algorithms Ax(1. ltoreq. x. ltoreq.n).
(3.f) calculation as active user uaRecommendation sub-algorithms A during recommendationxPrecision weight value W ofax(x is more than or equal to 1 and less than or equal to n), wherein n is the number of the recommended sub-algorithms, and the specific calculation method comprises the following steps:
Figure BDA0002438068960000102
wherein u isaRepresenting active users currently enjoying the recommended service, Ax(x is more than or equal to 1 and less than or equal to n) represents a certain recommendation sub-algorithm (n is the number of recommendation sub-algorithms), and M represents a training list TopMaxNumber of addresses in set, F1 (u)a,AxM) represents each recommendation sub-algorithm Ax(x is more than or equal to 1 and less than or equal to n) is used.
And fourthly, selecting an information gain IG as a system stability evaluation index, comparing the real score and the pre-score information of the address in the evaluation sub data set, and evaluating the system stability of each sub model in a non-malicious attack scene. According to each recommended sub-modelThe information gain IG value of (1) is active user uaComputing a set of stability weight values GaThe method comprises the following implementation steps:
(4.a) compute evaluation Sub data set Sub2aThe specific calculation method of the information entropy in (1) is as follows:
Figure BDA0002438068960000103
wherein u isaRepresenting active users currently enjoying the recommendation service, Sub2aIndicates that user u is to be activeaThe accessed addresses are divided according to a certain proportion to obtain an evaluation subdata set, and preference is given to the subdata set PreferaRepresenting the evaluation of the Sub data set Sub2aThe medium true score is larger than the address set of goodling.
(4.b) using A as each recommendation sub-algorithm in the second stepxAnd (x is more than or equal to 1 and less than or equal to n) represents that n is the number of the recommended sub-algorithms. The computation will evaluate the Sub data set Sub2aAddress in (1) according to sub-algorithm AxThe conditional entropy of the classification (classified into recommendation and non-recommendation) of recommendation results is calculated by the following specific calculation method:
Figure BDA0002438068960000111
wherein u isaRepresenting active users currently enjoying the recommended service, Ax(1 ≦ x ≦ n) represents a recommendation Sub-algorithm (n is the number of recommendation Sub-algorithms), Sub2aIndicates that user u is to be activeaThe accessed addresses are divided according to a certain proportion to obtain an evaluation subdata set and a training list TopMaxRepresenting the evaluation of the Sub data set Sub2aThe set of the first M addresses with the highest pre-score, M represents the training list TopMaxNumber of addresses in the set, TPaxIs the number of addresses really liked by the user in the recommendation list, FNaxThe number of addresses really liked by the user who is not in the recommendation list (not recommended).
(4.c) compute evaluation Sub data set Sub2aAddress in (1) according to sub-algorithm AxThe specific calculation method of the information gain after classification of the recommendation result is as follows:
IG(ua,Ax,M)=D(ua)-T(ua,Ax,M) (9)
wherein u isaRepresenting active users currently enjoying the recommended service, Ax(x is more than or equal to 1 and less than or equal to n) represents a certain recommendation sub-algorithm (n is the number of recommendation sub-algorithms), and M represents a training list TopMaxNumber of addresses in the set, D (u)a) Representing the evaluation of the Sub data set Sub2aEntropy of (1), T (u)a,AxM) indicates that the Sub data set Sub2 is to be evaluatedaAddress in (1) according to sub-algorithm AxThe conditional entropy of the classification (classified into recommendation and non-recommendation) of the recommendation result of (1).
(4.d) calculation as active user uaRecommendation sub-algorithms A during recommendationxStability weighted value G ofax(x is more than or equal to 1 and less than or equal to n), wherein n is the number of the recommended sub-algorithms, and the specific calculation method comprises the following steps:
Figure BDA0002438068960000112
wherein u isaRepresenting active users currently enjoying the recommended service, Ax(x is more than or equal to 1 and less than or equal to n) represents a certain recommendation sub-algorithm (n is the number of recommendation sub-algorithms), and M represents a training list TopMaxNumber of addresses in the set, IG (u)a,AxM) represents each recommendation sub-algorithm Ax(x is more than or equal to 1 and less than or equal to n).
Fifthly, comprehensively considering the robustness of the integrated recommendation system, balancing the relationship between the recommendation precision and the system stability, and providing active users u on the basis of two groups of weighting coefficientsaCalculating the final total weighting coefficient Ca. Pre-scoring the inaccessible addresses by each recommended submodel according to a total weighting coefficient CaAnd fusing to generate the final prediction scores of the integration model on the unaccessed addresses. And sorting all the inaccessible addresses according to the final prediction scores, and providing a recommendation list consisting of a plurality of addresses which are ranked at the top for the active users. The specific implementation steps are as follows:
(5.a) calculation as active user uaRecommendation sub-algorithms A during recommendationx(1. ltoreq. x. ltoreq.n) final weighting factor CaxAnd n is the number of the recommended sub-algorithms, and the specific calculation method comprises the following steps:
Figure BDA0002438068960000121
wherein, CaxPresentation recommendation sub-algorithm AxPre-scored final weight value, WaxDenoted as active user uaWhen recommending, the recommendation sub-algorithm AxPre-scored precision weight value, GaxPresentation recommendation sub-algorithm AxPre-scored stability weight values.
(5.b) for active user uaLocation i not visitedk(lk∈NewLa) And calculating final prediction scores for the positions by an integrated algorithm, wherein the specific calculation method comprises the following steps:
Figure BDA0002438068960000122
wherein, CaxIs an active user uaThe final weighting factor at the time of recommendation,
Figure BDA0002438068960000123
is the recommendation sub-algorithm AxFor active user uaLocation i not visitedkPre-scoring of (2).
(5, c) sorting all addresses which are not visited by the active user according to the final prediction score of the integration algorithm, forming a recommendation list by N positions with top ranking, and forming the recommendation list by TopNListaAnd returning to the active user.
And sixthly, evaluating the robustness of each recommendation system by using the precision index and the stability index, mainly comparing the comprehensive performance of the personalized position recommendation algorithm based on the ensemble learning and each sub-algorithm before the ensemble, and evaluating the applicability and the effectiveness of the proposed technology. The method comprises the following implementation steps:
and (6.a) randomly selecting U multiplied by 10% of users from the target data set as an active user set AU, and operating each recommendation algorithm for each active user in the set to generate a recommendation list.
And (6.b) evaluating the robustness of each recommendation system by using the Precision index and the stability index, wherein the values of Precision, Recall rate Recall, recommendation Precision index F1 and information gain IG of each algorithm running for the active user set AU once are the average value of the indexes of all users in the AU set.
(6.c) repeating steps (6.a) and (6.b) times Ntimes, i.e. all algorithms run Ntimes independently.
And (6.d) setting the values of Precision, Recall rate, recommendation Precision index F1 and information gain IG of the integrated algorithm and each sub-recommendation algorithm provided by the invention to be the average value of the running results of Ntimes.
(6.e) comparative analysis of the results of each index: if the recommendation precision index F1 of the integrated algorithm is larger than the recommendation precision index F1 values of all the sub-recommendation algorithms, the recommendation precision of the integrated algorithm is higher than that of all the sub-algorithms; if the information gain IG index of the integrated algorithm is larger than the maximum value in the information gain IG index of the sub-recommendation algorithm, the integrated algorithm is stable compared with all sub-algorithms; if the above two conclusions are both true, the technology proposed by the invention is more robust.
The following describes how the personalized location recommendation method based on ensemble learning according to the present invention works in detail by taking a specific location-based social network as an example.
Brightkit is a location-based social networking service provider where users share their location by checking in. The social network comprises 58228 users and 693362 positions, and 214078 social relationships are formed among the users. The brightkit dataset, which collects 4491143 check-in information during the 10 th month from 2008 to 2010, has become one of the most commonly used test datasets by recommendation system researchers. The present invention takes the data in the los angeles area in the brightkit data set as an example for instantiation.
The method comprises the following steps of firstly, collecting and sorting an original user check-in data set C, and converting the original user check-in data set C into a user-position scoring matrix R, wherein the specific operation steps are as follows:
(1.a) select the user in the area of los angeles in the example dataset brightkit to check-in dataset C. The data set consists of 61710 historical check-in records of 2951 addresses of 1233 users, 4216 social relationships are formed among the users, the average check-in times of each user is 50.05 times, the average number of check-in times of each user is 6.84 friends, and the average number of visit times of each position is 20.91 times. Each check-in record contains information such as a user ID, an address ID, an access time, an address longitude, an address latitude, and the like.
(1.b) converting each check-in record into a triplet (u)i,lj,nij) Wherein u isiIs the ith user (1 ≦ i ≦ 1233), ljIs the jth item (1. ltoreq. j. ltoreq.2951), nijRepresenting user uiAccess address ljThe number of times.
(1.c) calculating the location l of all usersjTotal number of accesses NC _ auj
(1.d) calculating user uiTotal number of visited locations NLCi
(1.e) calculating user uiTotal number of accesses to all locations NC _ ali
(1.f) calculating the visited location ljAll users NUC ofj
(1.g) user uiAt address ljNumber of check-ins nijConversion to user uiFor address ljScore r ofijThe specific method comprises the following steps:
Figure BDA0002438068960000131
wherein r isijRepresenting user uiFor address ljScore of n, nijRepresenting user uiAt address ljNumber of check-ins, NC _ aujIndicating all users are at location ljTotal number of accesses of, NLCiRepresenting user uiAccessTotal number of positions passed, NC _ aliRepresenting user uiTotal number of accesses to all locations, NUCjIndicating visited location ljOf all users.
(1.h) find the lowest value min of all scores in the user-location scoring matrix R to be 0 and the highest value max of all scores to be 12.61. And (3) carrying out normalization operation on the user scores obtained in the last step:
Figure BDA0002438068960000132
wherein r isijRepresenting user uiFor address ljThe score of (1).
After normalization, the user uiFor the address ljScore r ofijIs mapped to [0, 1 ]]In the interval, the value 1 indicates that the user visits the position frequently and likes the position very much; a value of 0 indicates that the user never visited the location, and a higher score value indicates that the user prefers the address.
Summing all scores to form a user-location score matrix R ═ Rij},i∈[1,1233],j∈[1,2951]Where i denotes a user number and j denotes an address number.
Second, select a certain active user u in LBS NaAs a recommended service object. Selecting any type and any number of recommendation sub-algorithms A1,A2,…,AnAnd n is the number of the recommended sub-algorithms. And dividing the addresses accessed by the active users into a training subdata set and an evaluation subdata set according to a certain proportion. And calculating pre-scores for the addresses which are not accessed by the active user and the addresses in the evaluation sub-data sets by using the accessed information of the active user training sub-data sets. The operation steps are as follows:
(2.a) obtaining an active user u in a certain los Angeles region in the example dataset BrightKiteaPersonal information, social relationships, historical access records.
(2.b) selecting four recommendation algorithms A1User-based collaborative filtering algorithm (UBCF), a2Singular Value Decomposition (SVD), A3Socialized based collaborative filtering (SCF), a4Kernel Density Estimation (KDE) is a sub-algorithm of the integration model in the present invention. The reason is as follows: UBCF is a typical representation of a memory-based collaborative filtering algorithm that can mine a user's personal preferences, but does not provide effective recommendations for new items or inactive users, i.e., the so-called cold start problem; SVD is a typical representation of a matrix decomposition technology in a model-based collaborative filtering algorithm, can deal with the cold start problem in UBCF, but has high computational complexity, low running speed and improved recommendation accuracy; considering that the social relationship among users is a main characteristic of LBSN, the SCF algorithm is selected as a supplement of UBCF algorithm, namely the influence of the social relationship on the user behavior mode is considered on the basis of the UBCF algorithm, SCF can obtain more accurate recommendation, but the SCF still has the problems of cold start, single recommendation result type and the like as the UBCF; in consideration of the geographic attribute characteristics of the positions in the LBSN N, the KDE algorithm simulates the influence of the geographic positions on the check-in activity of each user into personalized probability distribution, and the geographic characteristics of the positions in the LBSN N are reasonably mined. In addition, unlike the first three sub-algorithms, the KDE does not need to refer to access information of other users, and is therefore particularly suitable for sparse scoring matrices. The method has the main defects of low recommendation precision and unstable algorithm performance.
From the unique advantages and disadvantages of the four sub-algorithms, the four sub-algorithms selected by the invention complement each other, and the advantages and the disadvantages are complementary.
(2, c) carrying out model training on each algorithm to obtain each recommended sub-model MUBCF,MSVD,MSCF,MKDE
(2.d) setting a uniform address division ratio p to 0.4 for all active users, and enabling the active users uaThe accessed addresses are divided into training Sub data sets Sub1 according to the proportionaAnd evaluating the Sub data sets Sub2a
(2.e) Using recommendation submodels MUBCF,MSVD,MSCF,MKDEAnd active user uaTraining subdata set ofSub1aUBCF, SVD, SCF, KDE algorithms on the set of unaccessed addresses NewLaAnd evaluating the Sub data sets Sub2aAddress l ink(lk∈NewLa∪Sub2a) Calculating pre-scores, respectively
Figure BDA0002438068960000151
Thirdly, selecting an evaluation index F1 of recommendation precision as a recommendation accuracy evaluation index, and comparing and evaluating the Sub data sets Sub2aAnd evaluating the recommendation accuracy of each submodel according to the real scoring and pre-scoring information of the address. According to the recommendation precision index F1 value of each recommendation submodel, the active user u is selectedaComputing a set of precision weight values WaThe method comprises the following implementation steps:
(3.a) collecting recommendation sub-algorithms Ax(x is more than or equal to 1 and less than or equal to 4) is an active user uaCalculated pre-scoring information
Figure BDA0002438068960000152
The Sub data set Sub2 will be evaluatedaAll addresses l ink(lk∈Sub2a) Sorting by pre-scoring, taking the address of M-10 before ranking, and assigning to each algorithm AxGenerating a training list Top10ax
(3.b) collecting active users uaSub2 for evaluating Sub data setsaWill evaluate the Sub data set Sub2aPutting the preference sub data set Prefer to the address with the middle real score larger than the goodling ═ 0.05a
(3.c) calculating each recommendation sub-algorithm AxThe accuracy Precision of Precision is calculated by the following specific method:
Figure BDA0002438068960000153
wherein u isaRepresenting active users currently enjoying the recommended service, Ax(x is more than or equal to 1 and less than or equal to 4) represents a certain recommended sub-algorithm, a training list TopMaxRepresenting the evaluation of the Sub data set Sub2aSet of top10 addresses with highest pre-score, preference subdata set preferraRepresenting the evaluation of the Sub data set Sub2aAddress set with median truth score greater than 0.05, M represents training list TopMaxThe number of addresses in the set (M10).
(3.d) calculating recommendation sub-algorithms AxThe Recall rate Recall comprises the following specific calculation method:
Figure BDA0002438068960000154
wherein u isaRepresenting active users currently enjoying the recommended service, Ax(x is more than or equal to 1 and less than or equal to 4) represents a certain recommended sub-algorithm, a training list TopMaxRepresenting the evaluation of the Sub data set Sub2aThe set of the top10 addresses with the highest pre-score, the preference sub data set PreferaRepresents evaluating the subdata sets Sub2aAddress set with median truth score greater than 0.05, M represents training list TopMaxThe number of addresses in the set (M ═ 10).
(3.e) calculating recommendation sub-algorithms AxThe specific calculation method of the comprehensive accuracy index F1 is as follows:
Figure BDA0002438068960000155
wherein u isaRepresenting active users currently enjoying the recommended service, Ax(1. ltoreq. x.ltoreq.4) represents a certain recommendation sub-algorithm, M represents a training list TopMaxNumber of addresses in set (M ═ 10), precision (u ═ 10)a,AxM) indicates the accuracy of each recommended sub-algorithm, call (u)a,AxAnd M) represents the recall rate of each recommendation sub-algorithm.
After the four sub-algorithms are run 100 times (each time a group of target users is randomly selected), the frequency histogram based on the recommendation accuracy index F1 on the evaluation data set is shown in fig. 4.
(3.f) calculating as active user uaEach recommender when recommendingAlgorithm AxPrecision weight value W ofax(x is more than or equal to 1 and less than or equal to 4), and the specific calculation method comprises the following steps:
Figure BDA0002438068960000161
wherein u isaRepresenting active users currently enjoying the recommended service, Ax(1. ltoreq. x.ltoreq.4) represents a certain recommendation sub-algorithm, M represents a training list TopMaxNumber of addresses in set (M ═ 10), F1 (u)a,AxM) represents each recommendation sub-algorithm Ax(1. ltoreq. x. ltoreq.4) of the recommended precision.
And fourthly, selecting an information gain IG as a system stability evaluation index, comparing and evaluating the real score and pre-score information of the address in the subdata set, and evaluating the system stability of UBCF, SVD, SCF and KDE in a non-malicious attack scene. According to the information gain IG value of each recommended sub-model, the active user u is selectedaComputing a set of stability weight values GaThe method comprises the following implementation steps:
(4.a) compute evaluation Sub data set Sub2aThe specific calculation method of the information entropy in (1) is as follows:
Figure BDA0002438068960000162
wherein u isaRepresenting active users currently enjoying the recommendation service, Sub2aIndicates that user u is to be activeaThe accessed addresses are divided according to a certain proportion to obtain an evaluation subdata set, and preference is given to the subdata set PreferaRepresenting the evaluation of the Sub data set Sub2aAddress sets with median true scores greater than 0.05.
(4.b) compute Sub data set to be evaluated Sub2aAddress in (1) according to sub-algorithm Ax(x is more than or equal to 1 and less than or equal to 4) conditional entropy when the recommendation results are classified (classified into recommendation and non-recommendation), and the specific calculation method comprises the following steps:
Figure BDA0002438068960000163
wherein u isaRepresenting active users currently enjoying the recommended service, Ax(1. ltoreq. x.ltoreq.4) represents a certain recommended Sub-algorithm, Sub2aIndicates that user u is to be activeaThe accessed addresses are divided according to a certain proportion to obtain an evaluation subdata set and a training list TopMaxRepresenting the evaluation of the Sub data set Sub2aThe set of the top10 addresses with the highest pre-score, M represents the training list TopMaxNumber of addresses in set (M10), TPaxIs the number of addresses really liked by the user in the recommendation list, FNaxThe number of addresses really liked by the user who is not in the recommendation list (not recommended).
(4.c) compute evaluation Sub data set Sub2aAddress in (1) according to sub-algorithm AxThe specific calculation method of the information gain after classification of the recommendation result is as follows:
IG(ua,Ax,M)=D(ua)-T(ua,Ax,M) (21)
wherein u isaRepresenting active users currently enjoying the recommended service, Ax(1. ltoreq. x.ltoreq.4) represents a certain recommendation sub-algorithm, M represents a training list TopMaxNumber of addresses in set (M ═ 10), D (u)a) Represents evaluating the subdata sets Sub2aEntropy of information in (1), T (u)a,AxM) indicates that the Sub data set Sub2 is to be evaluatedaAddress in (1) according to sub-algorithm AxThe conditional entropy of the classification (classified into recommendation and non-recommendation) of the recommendation result of (1).
After running the four Sub-algorithms 100 times (randomly selecting a group of target users at a time), it is based on evaluating the Sub-data sets Sub2aThe frequency histogram of the above information gain IG index is shown in fig. 5.
(4.d) calculation as active user uaRecommendation sub-algorithms A during recommendationxSet of stability weights Gax(x is more than or equal to 1 and less than or equal to 4), and the specific calculation method comprises the following steps:
Figure BDA0002438068960000171
wherein u isaRepresenting active users currently enjoying the recommended service, Ax(1. ltoreq. x.ltoreq.4) represents a certain recommendation sub-algorithm, M represents a training list TopMaxNumber of addresses in set (M: 10), IG (u)a,AxM) represents each of the recommendation sub-algorithms Ax(x is more than or equal to 1 and less than or equal to 4).
Fifthly, comprehensively considering the robustness of the integrated recommendation system, balancing the relationship between the recommendation precision and the system stability, and providing active users u on the basis of two groups of weighting coefficientsaCalculating the final total weighting coefficient Ca. Pre-scoring the inaccessible addresses by each recommended submodel according to a total weighting coefficient CaAnd fusing to generate the final prediction scores of the integration model on the unaccessed addresses. And sorting all the inaccessible addresses according to the final prediction scores, and providing a recommendation list consisting of a plurality of addresses which are ranked at the top for the active users. The specific implementation steps are as follows:
(5.a) calculation as active user uaRecommendation sub-algorithms A during recommendationx(1. ltoreq. x. ltoreq.4) final weighting factor CaxThe specific calculation method comprises the following steps:
Figure BDA0002438068960000172
wherein, CaxPresentation recommendation sub-algorithm AxPre-scored final weight value, WaxDenoted as active user uaWhen recommending, the recommendation sub-algorithm AxPre-scored precision weight value, GaxPresentation recommendation sub-algorithm AxPre-scored stability weight values.
(5.b) for active user uaLocation i not visitedk(lk∈NewLa) And calculating final prediction scores for the positions by an integrated algorithm, wherein the specific calculation method comprises the following steps:
Figure BDA0002438068960000181
wherein, CaxIs an active user uaThe final weighting factor at the time of recommendation,
Figure BDA0002438068960000182
is the recommendation sub-algorithm AxFor active user uaLocation i not visitedkPre-scoring of (2).
(5, c) sorting all addresses which are not visited by the active user according to the final prediction score of the integration algorithm, forming a recommendation list by N positions with top ranking, and forming the recommendation list by TopNListaAnd returning to the active users (N can be a multiple of 5, and N is more than or equal to 5 and less than or equal to 50 under the normal condition).
And sixthly, evaluating the robustness of each recommendation system by using the precision index and the stability index, mainly comparing the comprehensive performance of the personalized position recommendation algorithm based on the ensemble learning and the integrated first four sub-algorithms, and evaluating the applicability and the effectiveness of the proposed technology. The realization steps are as follows:
and (6.a) randomly selecting 123 users from the target data set as an active user set AU, and operating an integrated recommendation algorithm and four sub-algorithms for each active user in the set to generate a recommendation list.
And (6.b) evaluating the robustness of each recommendation system by using the Precision index and the stability index, wherein the values of Precision, Recall, recommendation Precision index F1 and information gain IG of each algorithm which runs for the active user set AU once are the average value of the indexes of all users in the AU set.
(6.c) repeat steps (6.a) and (6.b) 100 times, i.e. all algorithms run 100 times independently.
Box-shaped graphs of 100 accuracy rates Precision, Recall rate Recall and recommendation accuracy index F1 generated in the process of 100 running of the integrated model provided by the invention are respectively shown in FIG. 6, FIG. 7 and FIG. 8.
And (6.d) setting the values of Precision, Recall, recommendation Precision index F1 and information gain IG of the integrated algorithm and the four sub-recommendation algorithms provided by the invention to be the average value of 100 running results. When N takes different values, the accuracy Precision, Recall, recommendation Precision index F1, and information gain IG results of each recommendation algorithm are shown in tables 2, 3, 4, and 5, respectively:
TABLE 2 accuracy Precision index values for different recommendation algorithms
Figure BDA0002438068960000183
TABLE 3 Recall ratio Recall index values for different recommendation algorithms
Figure BDA0002438068960000184
Figure BDA0002438068960000191
TABLE 4 recommendation accuracy F1 index values for different recommendation algorithms
Figure BDA0002438068960000192
TABLE 5 information gain IG index values for different recommendation algorithms
Figure BDA0002438068960000193
In this case, a histogram comparing the integrated model with the recommended accuracy index F1 for each submodel is shown in fig. 9, and a histogram comparing the information gain IG index is shown in fig. 10.
(6.e) comparing and analyzing the results of each index: the Precision, Recall rate and recommendation Precision index F1 of the integrated algorithm are all larger than the corresponding index values of all the sub-recommendation algorithms, and the recommendation Precision of the integrated algorithm is higher than that of all the sub-algorithms; the information gain IG index of the integrated algorithm is larger than the maximum value in the information gain IG indexes of the sub-recommendation algorithms, which shows that the integrated algorithm is more stable than all the sub-algorithms; the above two conclusions illustrate the robustness of the proposed technique.
The method is different from a conventional integrated algorithm, aims to construct a position recommendation system with strong expandability, high recommendation precision and stable recommendation result, considers the accuracy of the recommendation result and the diversity characteristics of data utilization and user behaviors, innovatively provides an evaluation mode using information gain as a system stability index, quantifies uncertainty caused by data source limitation (such as sparsity and cold start), different data preprocessing modes and model training, and improves the stability of the output result of the recommendation system. In addition, a set of weighting coefficients is customized for each user, and the integration algorithm can be ensured to be biased to different sub-algorithms for different users in a personalized weighting mode. The technology provided by the invention is beneficial to improving the robustness of the recommendation system, enhancing the service quality of the recommendation system, having wide application prospect and being expected to be widely applied to the social network market based on the position.
The above-described process flow is only a preferred embodiment of the present invention, but does not represent all the details of the present invention. Any modification, equivalent replacement, and improvement made by those skilled in the art within the technical scope of the present disclosure within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims (6)

1.A personalized position recommendation method based on ensemble learning is characterized by comprising the following steps:
step 1, collecting and sorting an original user sign-in data set C, and converting the original user sign-in data set C into a user-position scoring matrix R;
step 2, selecting a certain active user u in the location-based social network LBSN NaAs a recommendation service object, selecting any type and any number of recommendation sub-algorithms A1,A2,…,AnDividing the address visited by the active user into a training subdata set and an evaluation subdata set, and using the visited information of the training subdata set of the active user, and recommending the sub-algorithms to the address and the evaluation subdata which are not visited by the active userCentralized address calculation pre-scoring;
step 3, selecting an evaluation index F1 of recommendation precision as a recommendation precision evaluation index, comparing real scores and pre-score information of addresses in the evaluation sub-data sets, evaluating the recommendation precision of each recommendation sub-algorithm, and determining the active user u according to the recommendation precision index F1 value of each recommendation sub-algorithmaComputing a set of precision weight values Wa
Step 4, selecting information gain IG as a system stability evaluation index, comparing and evaluating the real score and pre-score information of the address in the sub data set, evaluating the system stability of each recommendation sub algorithm in a non-malicious attack scene, and providing an active user u according to the information gain IG value of each recommendation sub algorithmaComputing a set of stability weight values Ga
Step 5, in the precision weight value set WaAnd stability weight value set GaBased on active users uaCalculating the final total weighting coefficient CaPre-scoring the inaccessible addresses by the recommended submodels by a total weighting factor CaMerging, generating a final prediction score of the integrated model for the inaccessible addresses, sequencing all the inaccessible addresses according to the final prediction score, and providing a recommendation list consisting of a plurality of addresses ranked at the top for active users;
and 6, comparing the comprehensive performance of the personalized position recommendation algorithm based on ensemble learning and each sub-algorithm before integration, which are provided by the method, and evaluating the applicability and effectiveness of the method.
2. The method for personalized position recommendation based on ensemble learning according to claim 1, wherein step 1 of the method comprises:
step 11: selecting a user check-in data set C of a target recommendation system, wherein the data set is composed of historical check-in records of U users for L addresses, and extracting user ID, address ID, access time, address longitude and address latitude information from each check-in record;
step 12: converting each check-in record to a triplet (u)i,lj,nij) Wherein u isiIs the ith user, i is more than or equal to 1 and less than or equal to U, ljIs the jth item, j is more than or equal to 1 and less than or equal to L, nijRepresenting user uiAccess address ljThe number of times of (c);
step 13: calculate all users at location ljTotal number of accesses NC _ auj
Step 14: calculating user uiTotal number of visited locations NLCi
Step 15: computing user uiTotal number of accesses to all locations NC _ ali
Step 16: calculating visited location ljAll users NUC ofj
And step 17: user uiAt address ljNumber of check-ins nijConversion to user uiFor address ljScore r ofijThe specific method comprises the following steps:
Figure FDA0003637913300000021
wherein r isijRepresenting user uiFor the address ljScore of n, nijRepresenting user uiAt address ljNumber of check-ins, NC _ aujIndicating all users are at location ljL represents the total number of addresses, NLCiRepresenting user uiTotal number of accessed positions, NC _ aliRepresenting user uiTotal number of visits to all locations, U representing total number of users, NUCjIndicating visited location ljThe number of all users of (c);
step 18: the user score is normalized, and the specific calculation method comprises the following steps:
Figure FDA0003637913300000022
wherein r isijRepresenting user uiTo the groundAddress ljMin represents the lowest value of all scores in the user-position scoring matrix R, and max represents the highest value of all scores;
summing all scores to form a user-location score matrix R ═ Rij},i∈[1,U],j∈[1,L]。
3. The method for personalized position recommendation based on ensemble learning according to claim 1, wherein step 2 of the method comprises:
step 21: obtaining a certain active user u of the current service of the recommendation systemaThe information of (a);
step 22: selecting a group of recommendation algorithms A according to application scenes and data characteristics1,A2,…,AnA sub-algorithm as an integration model;
step 23: performing model training on each algorithm according to the operation mechanism of each recommendation sub-algorithm to obtain each recommendation sub-algorithm M1,M2,…,Mn
Step 24: setting a uniform address division ratio p for all active users, and dividing the active users uaThe accessed addresses are divided into Sub data sets Sub1 according to the proportionaAnd Sub data set Sub2a
Step 25: using recommendation sub-algorithms M1,M2,…,MnAnd active user uaSub1aSubdata set information, NewL for set of unaccessed addressesaAnd Sub data set Sub2aAddress l ink,lk∈NewLa∪Sub2aCalculate a pre-score, as
Figure FDA0003637913300000023
4. The ensemble learning-based personalized location recommendation method according to claim 1, wherein step 5 of the method comprises:
step 51: calculated as active user uaRecommendation sub-algorithms A during recommendationxAnd x is more than or equal to 1 and less than or equal to n, and the specific calculation method comprises the following steps:
Figure FDA0003637913300000031
wherein, CaxIs an active user uaFinal weighting factor at recommendation, WaxDenoted as active user uaWhen recommending, the recommendation sub-algorithm AxPre-scored precision weight value, GaxPresentation recommendation sub-algorithm AxA pre-scored stability weight value;
step 52: for active user uaLocation i not visitedk,lk∈NewLaAnd calculating final prediction scores for the positions by an integrated algorithm, wherein the specific calculation method comprises the following steps:
Figure FDA0003637913300000032
wherein, CaxIs an active user uaThe final weighting factor at the time of recommendation,
Figure FDA0003637913300000033
is the recommendation sub-algorithm AxFor active user uaLocation i not visitedkPre-scoring;
step 53: to set NewLaAll the addresses in the method are sorted according to the final prediction score of an integration algorithm, N positions with the top rank form a recommendation list, and the recommendation list is TopNListaAnd returning to the active user.
5. The method for recommending personalized positions based on ensemble learning according to claim 1, wherein said step 6 comprises:
step 61: randomly selecting Ux 10% of users from a target data set as an active user set AU, and operating each recommendation algorithm for each active user in the set to generate a recommendation list;
step 62: evaluating the robustness of each recommendation system by using the Precision index and the stability index, wherein the values of Precision indexes Precision, Recall, F1 and stability index IG of each algorithm which runs once for an active user set AU are the average value of the indexes of all users in the AU set;
and step 63: repeating the steps 61 and 62 Ntimes, namely independently running all algorithms for Ntimes;
step 64: setting the values of Precision, Recall, F1 and IG of the integration algorithm and each sub-recommendation algorithm as the average value of Ntimes running results;
step 65: and comparing and analyzing the results of all indexes: if the F1 value of the integration algorithm is larger than the F1 values of all the sub-recommendation algorithms, the recommendation precision of the integration algorithm is higher than that of all the sub-algorithms; if the IG index of the integrated algorithm is larger than the maximum value in the IG indexes of the sub-recommendation algorithms, the integrated algorithm is stable compared with all the sub-algorithms; if the two conclusions are established, the robustness of the integrated algorithm is stronger.
6. The personalized position recommendation method based on ensemble learning according to claim 1, wherein the method divides addresses visited by an active user into a training subdata set and an evaluation subdata set according to a certain proportion, selects a plurality of recommendation sub-algorithms of any type, utilizes historical score information in the training subdata set of the active user, calculates pre-scores of other addresses for the active user by each sub-algorithm, compares the historical scores and the pre-score information of the addresses in the evaluation subdata set, carries out accuracy evaluation and stability evaluation on each sub-algorithm, generates a personalized weighting coefficient for the active user according to an evaluation result, combines the pre-scores of the non-visited addresses by each sub-algorithm by using the weighting coefficient, generates a final prediction score of the non-visited addresses of the active user by an ensemble model, and ranks the prediction scores of all the non-visited addresses, and selecting a plurality of addresses with the top ranking to recommend to the active users.
CN202010257793.0A 2020-04-03 2020-04-03 Personalized position recommendation method based on ensemble learning Active CN111475744B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010257793.0A CN111475744B (en) 2020-04-03 2020-04-03 Personalized position recommendation method based on ensemble learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010257793.0A CN111475744B (en) 2020-04-03 2020-04-03 Personalized position recommendation method based on ensemble learning

Publications (2)

Publication Number Publication Date
CN111475744A CN111475744A (en) 2020-07-31
CN111475744B true CN111475744B (en) 2022-06-14

Family

ID=71750449

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010257793.0A Active CN111475744B (en) 2020-04-03 2020-04-03 Personalized position recommendation method based on ensemble learning

Country Status (1)

Country Link
CN (1) CN111475744B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112036987B (en) * 2020-09-11 2024-04-02 杭州海康威视数字技术股份有限公司 Method and device for determining recommended commodity
CN114881689A (en) * 2022-04-26 2022-08-09 驰众信息技术(上海)有限公司 Building recommendation method and system based on matrix decomposition
CN115687801B (en) * 2022-09-27 2024-01-19 南京工业职业技术大学 Position recommendation method based on position aging characteristics and time perception dynamic similarity

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120229624A1 (en) * 2011-03-08 2012-09-13 Bank Of America Corporation Real-time image analysis for providing health related information
CN106776982A (en) * 2016-12-02 2017-05-31 深圳市唯特视科技有限公司 A kind of social media sentiment analysis method of use machine learning
US11238544B2 (en) * 2017-07-07 2022-02-01 Msm Holdings Pte System and method for evaluating the true reach of social media influencers
CN107633444B (en) * 2017-08-29 2021-03-19 南京理工大学紫金学院 Recommendation system noise filtering method based on information entropy and fuzzy C-means clustering
CN109241227B (en) * 2018-09-03 2023-05-30 成都卡普数据服务有限责任公司 Spatiotemporal data prediction modeling method based on stacking integrated learning algorithm
CN109543109B (en) * 2018-11-27 2021-06-22 山东建筑大学 Recommendation algorithm integrating time window technology and scoring prediction model

Also Published As

Publication number Publication date
CN111475744A (en) 2020-07-31

Similar Documents

Publication Publication Date Title
Christensen et al. Social group recommendation in the tourism domain
Xu et al. A novel POI recommendation method based on trust relationship and spatial–temporal factors
Guo et al. Combining geographical and social influences with deep learning for personalized point-of-interest recommendation
CN111475744B (en) Personalized position recommendation method based on ensemble learning
Xu et al. Integrated collaborative filtering recommendation in social cyber-physical systems
Li et al. Next and next new POI recommendation via latent behavior pattern inference
US20120185481A1 (en) Method and Apparatus for Executing a Recommendation
Eliyas et al. Recommendation systems: Content-based filtering vs collaborative filtering
Wang et al. Group recommendation based on a bidirectional tensor factorization model
CN114036376A (en) Time-aware self-adaptive interest point recommendation method based on K-means clustering
Liang et al. Collaborative filtering based on information-theoretic co-clustering
CN114528480A (en) Time-sensing self-adaptive interest point recommendation method based on K-means clustering
Li et al. From reputation perspective: a hybrid matrix factorization for qos prediction in location‐aware mobile service recommendation system
KR20150122307A (en) Method and server apparatus for advertising
Kanaujia et al. A framework for development of recommender system for financial data analysis
Linda et al. Effective context-aware recommendations based on context weighting using genetic algorithm and alleviating data sparsity
Gu et al. CAMF: context aware matrix factorization for social recommendation
Haydar et al. Hybridising collaborative filtering and trust-aware recommender systems
Jamil et al. Collaborative item recommendations based on friendship strength in social network
Lu Personalized Recommendation Algorithm of Smart Tourism Based on Cross‐Media Big Data and Neural Network
Gao et al. [Retracted] Construction of Digital Marketing Recommendation Model Based on Random Forest Algorithm
Jia et al. Dynamic group recommendation algorithm based on member activity level
Sun Music Individualization Recommendation System Based on Big Data Analysis
Wasid et al. Context similarity measurement based on genetic algorithm for improved recommendations
Liu et al. Effective similarity measures of collaborative filtering recommendations based on user ratings habits

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant