CN111475744A - Personalized position recommendation method based on ensemble learning - Google Patents

Personalized position recommendation method based on ensemble learning Download PDF

Info

Publication number
CN111475744A
CN111475744A CN202010257793.0A CN202010257793A CN111475744A CN 111475744 A CN111475744 A CN 111475744A CN 202010257793 A CN202010257793 A CN 202010257793A CN 111475744 A CN111475744 A CN 111475744A
Authority
CN
China
Prior art keywords
recommendation
sub
algorithm
addresses
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010257793.0A
Other languages
Chinese (zh)
Other versions
CN111475744B (en
Inventor
朱俊
韩立新
勾智楠
杨忆
袁晓峰
李树
李景仙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN202010257793.0A priority Critical patent/CN111475744B/en
Publication of CN111475744A publication Critical patent/CN111475744A/en
Application granted granted Critical
Publication of CN111475744B publication Critical patent/CN111475744B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Evolutionary Computation (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a personalized position recommendation method based on ensemble learning, which comprises the following steps: firstly, converting a check-in data set into a scoring matrix; and secondly, selecting a plurality of recommendation sub-algorithms, and dividing the address accessed by the active user into a training sub-data set and an evaluation sub-data set. Utilizing the training sub data set, and calculating pre-scores for the addresses in the evaluation data set and the addresses which are not visited by each sub algorithm; thirdly, calculating the recommendation precision F1 of each sub-model by using the evaluation data set to generate a precision weight value set; selecting information gain IG as a stability index, evaluating the stability of each sub-model, and calculating a stability weight value set; and fifthly, calculating a final total weighting coefficient for the active users. The integration model fuses pre-scores of the inaccessible addresses by each sub-algorithm according to a total weighting coefficient to generate a final prediction score; and sixthly, evaluating the comprehensive performance of the method and each sub-algorithm before integration, and evaluating the effectiveness of the method.

Description

Personalized position recommendation method based on ensemble learning
Technical Field
The invention relates to an integrated learning-based personalized position recommendation method in a social network, and belongs to the technical field of artificial intelligence and machine learning.
Background
In L BSNs, complex Social relationships, such as friend relationships, coworker relationships, and relative relationships, can be established among users, users can also view places of interest (POIs), such as restaurants, shops, movie theaters, and the like, by using geographic information added in the Social Network, and check in by using the mobile device when visiting the points of interest, and publish the geographic location information of the users, and the suggestions of the users and the bs L BSNs help businesses further learn about the real services behind the Network, so that the real services meet the requirements of the users.
As the number of users registered in L BSNs is increasing, L BSNs store and accumulate abundant available information, and the abundant information enables the users not to quickly and effectively find information needed by the users within a limited time, therefore, a recommendation system which aims at solving the problem of information overload is concerned by more researchers, for example, a famous Amazon company uses the recommendation system to recommend commodities to the users, click rate and sales volume are improved for merchants, a movie recommendation website Netflix attracts a plurality of research teams to aim at improving recommendation accuracy by holding the recommendation system for a big race.
According to design strategies, the recommendation algorithm mainly comprises a collaborative filtering algorithm, a content-based recommendation algorithm and a hybrid recommendation algorithm, wherein the collaborative filtering algorithm comprises a memory-based collaborative filtering algorithm (such as user-based collaborative filtering (UBCF), project-based collaborative filtering (IBCF)) and a model-based collaborative filtering algorithm (such as Singular Value Decomposition (SVD), clustering model, probabilistic latent semantic analysis (P L SA)), in content-based location recommendation, a number of characteristics such as labels, classifications and user comments can be extracted from a location.
The above conventional recommendation techniques neither take into account the geographic characteristic impact of the location nor take advantage of social relationships between users. However, each location in the location recommendation system has geographic features identified by latitude and longitude, and the geographic features of the POIs can have a significant impact on the user's access preferences. In addition, the social relationship of the user also affects the check-in behavior of the user, and when the user does not determine the place where the user wants to go, the user often refers to the historical access records of friends in the social network. Therefore, when designing a location recommendation algorithm, it is necessary to take the factors in the aspect of the situation into consideration, mine the geographic features of the location, and utilize the social relationship between users.
Social-based Collaborative Filtering (SCF) is a recommended method that considers both the personal preferences and Social relationships of users, and is based on the assumption that friends all have the same interest preferences and are easily influenced by each other, and active users are more willing to make a decision for themselves through the experience of friends. In SCF, only the preferences of friends of the active user need to be considered when calculating the predictive rating of an address. In calculating the similarity between an active user and his friends, the history scores of both visited places can be used, the geographic distance between the user's residence can be used, or the similarity of the intersection and check-in history of their friendship networks can be considered. In addition, some research is dedicated to mining the geographic features of the location, and some techniques use matrix factorization, and more algorithms simulate the geographic influence through a common probability distribution, such as power law distribution, multi-center gaussian distribution, Kernel Density Estimation (KDE), and so on. When the KDE is used for predicting the probability of the user accessing the new position, the influence of the geographical position on the check-in activity of each user is personalized, and a more expressive geographical perception recommendation system is constructed.
However, most of the current position recommendation technologies are single algorithm models, and all the models are based on certain theoretical assumptions, so that each algorithm has inherent defects and can only play excellent roles in a specific application scenario. For example, content-based location recommendation is suitable for dealing with cold start problems, but it requires a large amount of structural information for users and locations, which increases the storage and computation costs of the system; the UBCF and IBCF algorithms only consider the neighborhood effect in the rating data, so that although the user preference is mined, the characteristics of the item content are ignored, and the diversity of the recommendation result is limited; the SVD algorithm has high computational complexity and low running speed, and the recommendation accuracy is still to be improved. To overcome the limitations of a single algorithm, some researchers have focused on how to combine a small number of several scoring prediction methods into a single overall model. Ensemble learning is just an effective means to solve this problem. Ensemble learning is a new machine learning paradigm that can effectively improve the generalization of learning systems by using multiple weak learners to solve the same problem. The authoritative Dietterich in the international field of machine learning has pointed out that ensemble learning is the first of four major research directions for machine learning (ensemble learning, symbolic learning, statistical learning, and reinforcement learning). Ensemble learning can exceed a single learning algorithm in several respects: the method has better average performance in different fields and data sets; a combined solution which cannot be obtained by any single learning algorithm can be found; the variation of the sampling is less sensitive to noise and outliers; solutions can be obtained by combining multiple distributed data sources or multiple characteristics of data sources, and the resultant fusion of multiple data sources or multiple characteristics of data sources is becoming increasingly important in distributed data mining. The effectiveness of the ensemble learning technology enables the ensemble learning technology to be widely applied to a plurality of fields such as biological feature recognition, computer-aided medical diagnosis, text recognition, Web information filtering and the like.
At present, some recommendation systems apply ensemble learning to personalized recommendation problems, and effectiveness and adaptability of information recommendation are improved. However, relevant research proves that the existing recommendation system based on ensemble learning still has many defects and shortcomings, which summarize the following points:
(1) in the fusion process, the number of considered sub-algorithms is fixed, the types of the considered sub-algorithms are limited, and the expandability of the integrated model is not strong. At present, most popular position recommendation systems based on ensemble learning only consider the fusion of two algorithms, and sub-algorithms are generally a certain collaborative filtering algorithm or position access probability estimation, so that the improvement range of the application scene and the system performance is limited to a certain extent. The existing integration framework cannot support the fusion of any number and any kind of recommendation sub-algorithms.
(2) The integration algorithm needs to set some weighting coefficients to fuse the prediction results of each sub-algorithm into a final prediction score, and the weighting coefficients represent the importance of each sub-algorithm. The fusion rule of existing algorithms is usually addition or multiplication or other simple linear combination, whose weighting coefficients are consistent for all users. However, in the real world, since the characteristics of each user and each item are different, the optimal sub-algorithms are not consistent for different users, that is, the algorithms most capable of mining and reflecting the user interests in the sub-algorithms are different from person to person. It follows that it is necessary to tailor a set of weighting coefficients for each user, ensuring that the integration algorithm can "bias" different sub-algorithms for different users by way of personalized weighting.
(3) In order to enhance the user experience, a good recommendation system should have the feature of robustness. The robustness of the recommendation system contains two indispensable factors of accuracy and stability. However, most of the current research is focused on only one of these aspects. In fact, the accuracy of the prediction determines whether the user likes the recommended location, and the stability of the system reflects whether the recommendation system can produce consistent recommendations in various application scenarios. Ignoring any of these aspects can affect the user's stickiness and reduce the profits of the service provider.
(4) At present, few stability studies almost limit application scenarios to malicious attacks, for example, an attacker tries to recommend a preset item to a user. However, in addition to malicious attacks, the inconsistency of the recommendation results may also be caused by uncertainties due to data source limitations (such as sparsity and cold start), different data preprocessing modes and model training, and the stability of the system is affected. But the system stability research under the non-malicious attack scene is almost blank.
The above-mentioned disadvantages of the existing recommendation system technology based on ensemble learning bring about major disadvantages in the design, development, deployment and operation of different e-commerce platforms, and especially cause the service quality of the recommendation system to be reduced on the network platform of massive project information, thereby affecting the sales performance of the e-commerce system.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides an integrated learning-based personalized position recommendation method aiming at constructing a position recommendation system with strong expandability, high recommendation precision and stable recommendation result, and systematically provides a technical flow scheme of an integrated recommendation algorithm. Meanwhile, the system theory is taken as a theoretical basis, the robustness evaluation system is taken as a necessary component of the recommendation system, the accuracy of the recommendation result is considered, the diversity characteristics of data utilization and user behaviors under non-malicious attack are also considered, an evaluation mode using information gain as a system stability index is innovatively provided, the uncertainty caused by data source limitation (such as sparsity and cold start), different data preprocessing modes and model training is quantized, and the stability of the output result of the recommendation system is improved. In addition, in the integrated model, personalized weighting is carried out on each sub-model, an integrated recommendation algorithm which best meets the interest characteristics of the user is customized for the user, and the service quality of the recommendation system is further enhanced.
The technical scheme adopted by the invention for solving the technical problems is as follows: the invention divides the address accessed by the active user into a training subdata set and an evaluation subdata set according to a certain proportion. Selecting a plurality of recommendation sub-algorithms of any type, and calculating pre-scores of other addresses for the active users by using historical score information in the active user training sub-data sets by each sub-algorithm. And comparing and evaluating historical scores and pre-score information of the addresses in the sub-data sets, carrying out accuracy evaluation and stability evaluation on each sub-algorithm, and generating personalized weighting coefficients for active users according to evaluation results. Combining the pre-scores of the sub-algorithms on the inaccessible addresses by using weighting coefficients to generate a final prediction score of the integration model on the inaccessible addresses of the active users, sorting the prediction scores of all the inaccessible addresses, and selecting a plurality of addresses ranked at the top to recommend to the active users (as shown in fig. 1).
The specific process of the method comprises the following steps:
step 1, collecting and sorting an original user sign-in data set C, and converting the original user sign-in data set C into a user-position scoring matrix R.
Step 2, selecting an active user u in the location-based social network L BSNaAs a recommended service object. Selecting any type and any number of recommendation sub-algorithms A1,A2,…,An. And dividing the addresses accessed by the active users into a training subdata set and an evaluation subdata set according to a certain proportion. And calculating pre-scores for the addresses which are not accessed by the active user and the addresses in the evaluation sub-data sets by using the accessed information of the active user training sub-data sets.
And 3, selecting the evaluation index F1 of the recommendation accuracy as a recommendation accuracy evaluation index, and comparing the real score and the pre-score information of the address in the evaluation sub-data set to evaluate the recommendation accuracy of each sub-model. According to the recommendation precision index F1 value of each recommendation submodel, the active user u is selectedaComputing a set of precision weight values Wa
And 4, selecting the information gain IG as a system stability evaluation index, comparing the real score and the pre-score information of the address in the evaluation sub data set, and evaluating the system stability of each sub model in a non-malicious attack scene. According to the information gain IG value of each recommended sub-model, the active user u is selectedaComputing a set of stability weight values Ga
Step 5, comprehensively considering the robustness of the integrated recommendation system, balancing the relationship between the recommendation precision and the system stability, and providing the active users u on the basis of two groups of weighting coefficientsaCalculating the final total weighting coefficient Ca. Pre-scoring the inaccessible addresses by each recommended submodel according to a total weighting coefficient CaAnd fusing to generate the final prediction scores of the integration model on the unaccessed addresses. And sorting all the inaccessible addresses according to the final prediction scores, and providing a recommendation list consisting of a plurality of addresses which are ranked at the top for the active users.
And 6, evaluating the robustness of each recommendation system by using the precision index and the stability index, mainly comparing the comprehensive performance of the personalized position recommendation algorithm based on the ensemble learning and each sub-algorithm before the ensemble, and evaluating the applicability and the effectiveness of the proposed technology.
Has the advantages that:
1. the invention has strong expandability and supports the fusion of any type and any number of recommendation sub-algorithms. In practical application, the method and the system can select a proper recommendation sub-algorithm according to different application scenes and different data characteristics, obtain higher recommendation quality on the basis of any one existing algorithm, improve the user stickiness in the location-based social network, and help merchants accurately push advertisements for the users, so that more potential consumers are attracted.
2. According to the invention, a group of weight coefficients are customized for each active user by analyzing different behavior characteristics of each user, and the integrated algorithm can be ensured to be the sub-algorithm which can most mine the interest of different users according to the 'bias' of the different users in a personalized weighting mode. The integration mode of 'customized according to different persons' greatly improves the use satisfaction degree of users to the social network platform, is also beneficial to solving other machine learning problems, and has very important significance to practical application.
3. In the fusion process, the recommendation precision index F1 value integrating the accuracy and the recall rate is selected as the evaluation index of the recommendation accuracy, so that the integrated model is better than each sub-model in the recommendation accuracy, the preference degree of the recommendation result to the user is ensured, and the aim of improving the prediction accuracy of the recommendation algorithm is fulfilled.
4. The method innovatively uses the information gain as the evaluation index of the stability of the recommendation system, fully considers the uncertainty caused by data source limitation (such as sparsity and cold start), different data preprocessing modes and model training, can measure a plurality of factors causing the instability of the system, and ensures the system stability of the recommendation system in a non-malicious attack scene.
5. The method comprehensively considers the prediction accuracy and the system stability, and robustly improves the service quality of the recommendation system. The method has certain universality and portability, can be applied to a position recommendation system, is also suitable for the personalized recommendation field of other traditional projects, and has wide industrial application prospect.
6. The method and the device aim at constructing the position recommendation system with strong expandability, high recommendation precision and stable recommendation result, take the diversity characteristics of data utilization and user behaviors into consideration while considering the accuracy of the recommendation result, innovatively propose an evaluation mode using information gain as a system stability index, quantify uncertainties caused by data source limitation (such as sparsity and cold start), different data preprocessing modes and model training, and well improve the stability of the output result of the recommendation system.
Drawings
Fig. 1 is a flowchart of a personalized position recommendation method based on ensemble learning according to the present invention.
Fig. 2 is a flowchart of specific steps of the personalized position recommendation method based on ensemble learning according to the present invention.
FIG. 3 is a flow chart of the steps of the present invention for converting raw user check-in records to a user-location scoring matrix.
FIG. 4 is a frequency histogram of recommendation accuracy indicators F1 on the evaluation sub data set after each recommendation sub-algorithm has been run 100 times (each time a group of target users is randomly selected) in an embodiment of the present invention.
Fig. 5 is a frequency histogram of each recommended sub-algorithm after 100 runs (randomly selecting a group of target users each time) in an embodiment of the present invention based on evaluating the information gain IG on the sub-data set.
FIG. 6 is a box plot of the accuracy of the integrated model after 100 runs in an embodiment of the present invention.
FIG. 7 is a box plot of the recall after 100 runs of the integrated model in an embodiment of the present invention.
FIG. 8 is a box diagram of the recommended accuracy index F1 after 100 runs of the integration model in an embodiment of the invention.
FIG. 9 is a histogram comparing the integration model with the recommended accuracy index F1 for each sub-model in the embodiment of the present invention.
FIG. 10 is a histogram comparing the integrated model with the information gain IG of each sub-model in an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and specific examples.
The specific flow of the design and implementation of the invention is shown in fig. 2, and the main variables and parameters in the process are shown in table 1.
TABLE 1 Functions of the principal variables and parameters
Figure BDA0002438068960000061
Figure BDA0002438068960000071
Firstly, collecting and sorting an original user check-in data set C, and converting the original user check-in data set C into a user-position scoring matrix R, wherein the specific flow is shown in FIG. 3, and the operation steps are as follows:
and (1.a) selecting a user check-in data set C of the target recommendation system, wherein the data set is composed of historical check-in records of L addresses of U users, and information such as user ID, address ID, access time, address longitude, address latitude and the like is extracted from each check-in record.
(1.b) converting each check-in record into a triplet (u)i,lj,nij) Wherein u isiIs the ith user (1 ≦ i ≦ U), ljIs the jth item (1. ltoreq. j. ltoreq. L), nijRepresenting user uiAccess address ljThe number of times.
(1.c) calculating the location l of all usersjTotal number of accesses NC _ auj
(1.d) calculating user uiTotal number of visited locations N L Ci
(1.e) calculating user uiTotal number of accesses to all locations NC _ ali
(1.f) calculating the visited location ljAll users NUC ofj
(1.g) user uiAt address ljNumber of check-ins nijConversion to user uiFor address ljScore r ofijThe specific method comprises the following steps:
Figure BDA0002438068960000081
wherein r isijRepresenting user uiFor address ljScore of n, nijRepresenting user uiAt address ljNumber of check-ins, NC _ aujIndicating all users are at location ljL denotes the total number of addresses, N L CiRepresenting user uiTotal number of accessed positions, NC _ aliRepresenting user uiTotal number of visits to all locations, U representing total number of users, NUCjIndicating visited location ljOf all users.
(1, h) carrying out normalization operation on the user scores, wherein the specific calculation method comprises the following steps:
Figure BDA0002438068960000082
wherein r isijRepresenting user uiFor address ljMin represents the lowest value of all scores in the user-location score matrix R, and max represents the highest value of all scores. Through normalization operationAfter, user uiFor address ljScore r ofijIs mapped to [0, 1 ]]In the interval, the value 1 indicates that the user visits the position frequently and likes the position very much; a value of 0 indicates that the user never visited the location, and a higher score value indicates that the user prefers the address.
Summing all scores to form a user-location score matrix R ═ Rij},i∈[1,U],j∈[1,L]Where i denotes a user number, j denotes an address number, U denotes a total number of users, L denotes a total number of addresses, rijRepresenting user uiFor address ljThe score of (1).
Second, choose L active user u in BSNaAs a recommended service object. Selecting any type and any number of recommendation sub-algorithms A1,A2,…,AnAnd n is the number of the recommended sub-algorithms. And dividing the addresses accessed by the active users into a training subdata set and an evaluation subdata set according to a certain proportion. And calculating pre-scores for the addresses which are not accessed by the active user and the addresses in the evaluation sub-data sets by using the accessed information of the active user training sub-data sets. The operation steps are as follows:
(2.a) obtaining a certain active user u currently served by the recommendation systemaThe information of (1).
(2.b) selecting a set of recommendation algorithms A according to the application scene and the data characteristics1,A2,…,An(n is the number of recommended sub-algorithms) as the sub-algorithms of the integration model in the invention, for example, a collaborative filtering algorithm (UBCF) based on a user, a collaborative filtering algorithm (IBCF) based on a project, a collaborative filtering (SCF) based on socialization, a Kernel Density Estimation (KDE), a Singular Value Decomposition (SVD), other existing integration algorithms and the like can be selected.
(2, c) carrying out model training on each algorithm according to the operation mechanism of each recommended sub-algorithm to obtain each recommended sub-model M1,M2,…,Mn(n is the number of recommended sub-algorithms).
(2, d) setting a uniform address division ratio p for all active users, and dividing the active users uaThe accessed address is divided according to the proportionDivided into training Sub data sets Sub1aAnd evaluating the Sub data sets Sub2a
(2.e) Using recommendation submodels M1,M2,…,Mn(n is the number of recommendation sub-algorithms) and active users uaSub1 of the training Sub data setaFor the set of unaccessed addresses New LaAnd evaluating the Sub data sets Sub2aAddress l ink(lk∈NewLa∪Sub2a) Calculate a pre-score, as
Figure BDA0002438068960000091
Thirdly, selecting an evaluation index F1 of recommendation precision as a recommendation accuracy evaluation index, and comparing and evaluating the Sub data sets Sub2aAnd evaluating the recommendation accuracy of each submodel according to the real scoring and pre-scoring information of the address. According to the recommendation precision index F1 value of each recommendation submodel, the active user u is selectedaComputing a set of precision weight values WaThe method comprises the following implementation steps:
(3.a) using A as each recommendation sub-algorithm in the second stepxAnd (x is more than or equal to 1 and less than or equal to n) represents that n is the number of the recommended sub-algorithms. Collecting each sub-algorithm as active user uaThe computed pre-scoring information will evaluate the Sub data sets Sub2aAll addresses in the training list are sorted according to pre-scores, and the address of M before the ranking is taken to generate a training list TopMaxAnd (4) collecting.
(3.b) collecting active users uaSub2 for evaluating Sub data setsaWill evaluate the Sub data set Sub2aPutting addresses with middle real scores larger than goodling into a set preference sub data set Prefera
(3.c) calculating each recommendation sub-algorithm AxThe accuracy Precision of Precision is calculated by the following specific method:
Figure BDA0002438068960000092
wherein u isaRepresenting active users currently enjoying the recommended service, Ax(x is more than or equal to 1 and less than or equal to n) represents a certain recommendation sub-algorithm (n is the number of the recommendation sub-algorithms), and a training list TopMaxRepresenting the evaluation of the Sub data set Sub2aThe set of the top M addresses with the highest pre-score, the preference sub data set PreferaRepresenting the evaluation of the Sub data set Sub2aThe middle real score is larger than the address set of goodling, M represents the training list TopMaxNumber of addresses in the set.
(3.d) calculating recommendation sub-algorithms AxThe Recall rate Recall comprises the following specific calculation method:
Figure BDA0002438068960000093
wherein u isaRepresenting active users currently enjoying the recommended service, Ax(x is more than or equal to 1 and less than or equal to n) represents a certain recommendation sub-algorithm (n is the number of the recommendation sub-algorithms), and a training list TopMaxRepresenting the evaluation of the Sub data set Sub2aThe set of the top M addresses with the highest pre-score, the preference sub data set PreferaRepresenting the evaluation of the Sub data set Sub2aThe middle real score is larger than the address set of goodling, M represents the training list TopMaxNumber of addresses in the set.
(3.e) calculating recommendation sub-algorithms AxThe specific calculation method of the comprehensive accuracy index F1 is as follows:
Figure BDA0002438068960000101
wherein u isaRepresenting active users currently enjoying the recommended service, Ax(x is more than or equal to 1 and less than or equal to n) represents a certain recommendation sub-algorithm (n is the number of recommendation sub-algorithms), and M represents a training list TopMaxNumber of addresses in the set, precision (u)a,AxM) represents each recommendation sub-algorithm Ax(x is not less than 1 and not more than n) accuracy, called (u)a,AxM) represents each recommendation sub-algorithm Ax(1. ltoreq. x. ltoreq.n).
(3.f) calculation as active user uaRecommendation sub-algorithms A during recommendationxPrecision weight ofValue Wax(x is more than or equal to 1 and less than or equal to n), wherein n is the number of the recommended sub-algorithms, and the specific calculation method comprises the following steps:
Figure BDA0002438068960000102
wherein u isaRepresenting active users currently enjoying the recommended service, Ax(x is more than or equal to 1 and less than or equal to n) represents a certain recommendation sub-algorithm (n is the number of recommendation sub-algorithms), and M represents a training list TopMaxNumber of addresses in set, F1 (u)a,AxM) represents each recommendation sub-algorithm Ax(x is more than or equal to 1 and less than or equal to n) is used.
And fourthly, selecting an information gain IG as a system stability evaluation index, comparing the real score and the pre-score information of the address in the evaluation sub data set, and evaluating the system stability of each sub model in a non-malicious attack scene. According to the information gain IG value of each recommended sub-model, the active user u is selectedaComputing a set of stability weight values GaThe method comprises the following implementation steps:
(4.a) compute evaluation Sub data set Sub2aThe specific calculation method of the information entropy in (1) is as follows:
Figure BDA0002438068960000103
wherein u isaRepresenting active users currently enjoying the recommendation service, Sub2aIndicates that user u is to be activeaThe accessed addresses are divided according to a certain proportion to obtain an evaluation subdata set, and preference is given to the subdata set PreferaRepresenting the evaluation of the Sub data set Sub2aThe medium true score is larger than the address set of goodling.
(4.b) using A as each recommendation sub-algorithm in the second stepxAnd (x is more than or equal to 1 and less than or equal to n) represents that n is the number of the recommended sub-algorithms. The computation will evaluate the Sub data set Sub2aAddress in (1) according to sub-algorithm AxThe conditional entropy of the classification (classified into recommendation and non-recommendation) of the recommendation result is calculated by the following specific calculation method:
Figure BDA0002438068960000111
wherein u isaRepresenting active users currently enjoying the recommended service, Ax(1 ≦ x ≦ n) represents a recommendation Sub-algorithm (n is the number of recommendation Sub-algorithms), Sub2aIndicates that user u is to be activeaThe accessed addresses are divided according to a certain proportion to obtain an evaluation subdata set and a training list TopMaxRepresenting the evaluation of the Sub data set Sub2aThe set of the first M addresses with the highest pre-score, M represents the training list TopMaxNumber of addresses in the set, TPaxIs the number of addresses really liked by the user in the recommendation list, FNaxThe number of addresses really liked by the user who is not in the recommendation list (not recommended).
(4.c) compute evaluation Sub data set Sub2aAddress in (1) according to sub-algorithm AxThe specific calculation method of the information gain after classification of the recommendation result is as follows:
IG(ua,Ax,M)=D(ua)-T(ua,Ax,M) (9)
wherein u isaRepresenting active users currently enjoying the recommended service, Ax(x is more than or equal to 1 and less than or equal to n) represents a certain recommendation sub-algorithm (n is the number of recommendation sub-algorithms), and M represents a training list TopMaxNumber of addresses in the set, D (u)a) Representing the evaluation of the Sub data set Sub2aEntropy of (1), T (u)a,AxM) indicates that the Sub data set Sub2 is to be evaluatedaAddress in (1) according to sub-algorithm AxThe conditional entropy of the classification (classified into recommendation and non-recommendation) of the recommendation result of (1).
(4.d) calculation as active user uaRecommendation sub-algorithms A during recommendationxStability weighted value G ofax(x is more than or equal to 1 and less than or equal to n), wherein n is the number of the recommended sub-algorithms, and the specific calculation method comprises the following steps:
Figure BDA0002438068960000112
wherein u isaRepresenting active users currently enjoying the recommended service, Ax(1. ltoreq. x. ltoreq.n) representsA certain recommendation sub-algorithm (n is the number of the recommendation sub-algorithms), and M represents a training list TopMaxNumber of addresses in the set, IG (u)a,AxM) represents each recommendation sub-algorithm Ax(x is more than or equal to 1 and less than or equal to n).
Fifthly, comprehensively considering the robustness of the integrated recommendation system, balancing the relationship between the recommendation precision and the system stability, and providing active users u on the basis of two groups of weighting coefficientsaCalculating the final total weighting coefficient Ca. Pre-scoring the inaccessible addresses by each recommended submodel according to a total weighting coefficient CaAnd fusing to generate the final prediction scores of the integration model on the unaccessed addresses. And sorting all the inaccessible addresses according to the final prediction scores, and providing a recommendation list consisting of a plurality of addresses which are ranked at the top for the active users. The specific implementation steps are as follows:
(5.a) calculation as active user uaRecommendation sub-algorithms A during recommendationx(1. ltoreq. x. ltoreq.n) final weighting factor CaxAnd n is the number of the recommended sub-algorithms, and the specific calculation method comprises the following steps:
Figure BDA0002438068960000121
wherein, CaxPresentation recommendation sub-algorithm AxPre-scored final weight value, WaxDenoted as active user uaWhen recommending, the recommendation sub-algorithm AxPre-scored precision weight value, GaxPresentation recommendation sub-algorithm AxPre-scored stability weight values.
(5.b) for active user uaLocation i not visitedk(lk∈NewLa) And calculating final prediction scores for the positions by an integrated algorithm, wherein the specific calculation method comprises the following steps:
Figure BDA0002438068960000122
wherein, CaxIs an active user uaThe final weighting factor at the time of recommendation,
Figure BDA0002438068960000123
is the recommendation sub-algorithm AxFor active user uaLocation i not visitedkPre-scoring of (2).
(5, c) sorting all addresses which are not visited by the active user according to the final prediction score of the integration algorithm, forming a recommendation list by N positions which are ranked at the top, and enabling the recommendation list to be TopN L istaAnd returning to the active user.
And sixthly, evaluating the robustness of each recommendation system by using the precision index and the stability index, mainly comparing the comprehensive performance of the personalized position recommendation algorithm based on the ensemble learning and each sub-algorithm before the ensemble, and evaluating the applicability and the effectiveness of the proposed technology. The method comprises the following implementation steps:
and (6.a) randomly selecting U × 10% of users from the target data set as an active user set AU, and operating each recommendation algorithm for each active user in the set to generate a recommendation list.
And (6.b) evaluating the robustness of each recommendation system by using the Precision index and the stability index, wherein the values of Precision, Recall, recommendation Precision index F1 and information gain IG of each algorithm which runs for the active user set AU once are the average value of the indexes of all users in the AU set.
(6.c) repeating steps (6.a) and (6.b) times Ntimes, i.e. all algorithms run Ntimes independently.
And (6.d) setting the values of Precision, Recall rate, recommendation Precision index F1 and information gain IG of the integrated algorithm and each sub-recommendation algorithm provided by the invention to be the average value of the running results of Ntimes.
(6.e) comparing and analyzing the results of each index: if the recommendation precision index F1 of the integrated algorithm is larger than the recommendation precision index F1 values of all the sub-recommendation algorithms, the recommendation precision of the integrated algorithm is higher than that of all the sub-algorithms; if the information gain IG index of the integrated algorithm is larger than the maximum value in the information gain IG index of the sub-recommendation algorithm, the integrated algorithm is stable compared with all sub-algorithms; if the above two conclusions are both true, the technology proposed by the invention is more robust.
The following describes how the personalized location recommendation method based on ensemble learning according to the present invention works in detail by taking a specific location-based social network as an example.
Brightkit is a location-based social networking service provider where users share their location by checking in. The social network comprises 58228 users and 693362 positions, and 214078 social relationships are formed among the users. The brightkit dataset, which collects 4491143 check-in information during the 10 th month from 2008 to 2010, has become one of the most commonly used test datasets by recommendation system researchers. The present invention takes the data in the los angeles area in the brightkit data set as an example for instantiation.
The method comprises the following steps of firstly, collecting and sorting an original user check-in data set C, and converting the original user check-in data set C into a user-position scoring matrix R, wherein the specific operation steps are as follows:
(1.a) select the user in the area of los angeles in the example dataset brightkit to check-in dataset C. The data set consists of 61710 historical check-in records of 2951 addresses of 1233 users, 4216 social relationships are formed among the users, the average check-in times of each user is 50.05 times, the average number of check-in times of each user is 6.84 friends, and the average number of visit times of each position is 20.91 times. Each check-in record contains information such as a user ID, an address ID, an access time, an address longitude, an address latitude, and the like.
(1.b) converting each check-in record into a triplet (u)i,lj,nij) Wherein u isiIs the ith user (1 ≦ i ≦ 1233), ljIs the jth item (1. ltoreq. j. ltoreq.2951), nijRepresenting user uiAccess address ljThe number of times.
(1.c) calculating the location l of all usersjTotal number of accesses NC _ auj
(1.d) calculating user uiTotal number of visited locations N L Ci
(1.e) calculating user uiTotal number of accesses to all locations NC _ ali
(1.f) calculating the visited location ljAll users NUC ofj
(1.g) user uiAt address ljNumber of check-ins nijConversion to user uiFor address ljScore r ofijThe specific method comprises the following steps:
Figure BDA0002438068960000131
wherein r isijRepresenting user uiFor address ljScore of n, nijRepresenting user uiAt address ljNumber of check-ins, NC _ aujIndicating all users are at location ljTotal number of accesses of N L CiRepresenting user uiTotal number of accessed positions, NC _ aliRepresenting user uiTotal number of accesses to all locations, NUCjIndicating visited location ljOf all users.
(1.h) find the lowest value min of all scores in the user-location scoring matrix R to be 0 and the highest value max of all scores to be 12.61. And (3) carrying out normalization operation on the user scores obtained in the last step:
Figure BDA0002438068960000132
wherein r isijRepresenting user uiFor address ljThe score of (1).
After normalization, the user uiFor address ljScore r ofijIs mapped to [0, 1 ]]In the interval, the value 1 indicates that the user visits the position frequently and likes the position very much; a value of 0 indicates that the user never visited the location, and a higher score value indicates that the user prefers the address.
Summing all scores to form a user-location score matrix R ═ Rij},i∈[1,1233],j∈[1,2951]Where i denotes a user number and j denotes an address number.
Second, choose L active user u in BSNaAs a recommended service object. Selecting an arbitrary classType, arbitrary number recommendation sub-algorithm A1,A2,…,AnAnd n is the number of the recommended sub-algorithms. And dividing the addresses accessed by the active users into a training subdata set and an evaluation subdata set according to a certain proportion. And calculating pre-scores for the addresses which are not accessed by the active user and the addresses in the evaluation sub-data sets by using the accessed information of the active user training sub-data sets. The operation steps are as follows:
(2.a) obtaining an active user u in a certain los Angeles region in an example dataset BrightKiteaPersonal information, social relationships, historical access records.
(2.b) selecting four recommendation algorithms A1User-based collaborative filtering algorithm (UBCF), a2Singular Value Decomposition (SVD), A3Socialized based collaborative filtering (SCF), a4UBCF is a typical representative of a collaborative filtering algorithm based on memory, which can mine personal preferences of users, but cannot provide effective recommendations for new projects or inactive users, i.e., the so-called cold start problem exists, SVD is a typical representative of a matrix decomposition technique in a collaborative filtering algorithm based on a model, which can cope with the cold start problem in UBCF, but has high computational complexity, slow operation speed and yet to be improved in recommendation accuracy, SCF can obtain more accurate recommendations considering that the social relationship among users is a main feature of L BSN, the invention selects SCF as a supplement to UBCF algorithm, i.e., the influence of the social relationship on user behavior patterns is considered on the basis of UBCF algorithm, SCF can obtain more accurate recommendations, but like UBCF, it still has the single-class cold start and recommendation result type problem, and the like UBCF, the KDE algorithm considers the geographical attribute features of locations in L BSN, the KDE algorithm simulates the sign-up probability of each user to the activities as a sub-algorithm, and is not particularly suitable for sparse mining of the sub-algorithm L.
From the unique advantages and disadvantages of the four sub-algorithms, the four sub-algorithms selected by the invention complement each other, and the advantages and the disadvantages are complementary.
(2, c) carrying out model training on each algorithm to obtain each recommended sub-model MUBCF,MSVD,MSCF,MKDE
(2.d) setting a uniform address division ratio p to 0.4 for all active users, and enabling the active users uaThe accessed addresses are divided into training Sub data sets Sub1 according to the proportionaAnd evaluating the Sub data sets Sub2a
(2.e) Using recommendation submodels MUBCF,MSVD,MSCF,MKDEAnd active user uaSub1 of the training Sub data setaUBCF, SVD, SCF, KDE algorithms on the set of unaccessed addresses New LaAnd evaluating the Sub data sets Sub2aAddress l ink(lk∈NewLa∪Sub2a) Calculating pre-scores, respectively
Figure BDA0002438068960000151
Thirdly, selecting an evaluation index F1 of recommendation precision as a recommendation accuracy evaluation index, and comparing and evaluating the Sub data sets Sub2aAnd evaluating the recommendation accuracy of each submodel according to the real scoring and pre-scoring information of the address. According to the recommendation precision index F1 value of each recommendation submodel, the active user u is selectedaComputing a set of precision weight values WaThe method comprises the following implementation steps:
(3.a) collecting recommendation sub-algorithms Ax(x is more than or equal to 1 and less than or equal to 4) is an active user uaCalculated pre-scoring information
Figure BDA0002438068960000152
The Sub data set Sub2 will be evaluatedaAll addresses l ink(lk∈Sub2a) Sorting by pre-scoring, taking the address of M-10 before ranking, and assigning to each algorithm AxGenerating a training list Top10ax
(3.b) collecting active users uaFor an evaluationEstimate data set Sub2aWill evaluate the Sub data set Sub2aPutting the preference sub data set Prefer to the address with the middle real score larger than the goodling ═ 0.05a
(3.c) calculating each recommendation sub-algorithm AxThe accuracy Precision of Precision is calculated by the following specific method:
Figure BDA0002438068960000153
wherein u isaRepresenting active users currently enjoying the recommended service, Ax(x is more than or equal to 1 and less than or equal to 4) represents a certain recommended sub-algorithm, a training list TopMaxRepresenting the evaluation of the Sub data set Sub2aThe set of the top10 addresses with the highest pre-score, the preference sub data set PreferaRepresenting the evaluation of the Sub data set Sub2aAddress set with median truth score greater than 0.05, M represents training list TopMaxThe number of addresses in the set (M ═ 10).
(3.d) calculating recommendation sub-algorithms AxThe Recall rate Recall comprises the following specific calculation method:
Figure BDA0002438068960000154
wherein u isaRepresenting active users currently enjoying the recommended service, Ax(x is more than or equal to 1 and less than or equal to 4) represents a certain recommended sub-algorithm, a training list TopMaxRepresenting the evaluation of the Sub data set Sub2aThe set of the top10 addresses with the highest pre-score, the preference sub data set PreferaRepresenting the evaluation of the Sub data set Sub2aAddress set with median truth score greater than 0.05, M represents training list TopMaxThe number of addresses in the set (M ═ 10).
(3.e) calculating recommendation sub-algorithms AxThe specific calculation method of the comprehensive accuracy index F1 is as follows:
Figure BDA0002438068960000155
wherein,uaRepresenting active users currently enjoying the recommended service, Ax(1. ltoreq. x.ltoreq.4) represents a certain recommendation sub-algorithm, M represents a training list TopMaxNumber of addresses in set (M ═ 10), precision (u ═ 10)a,AxM) indicates the accuracy of each recommended sub-algorithm, call (u)a,AxAnd M) represents the recall rate of each recommended sub-algorithm.
After the four sub-algorithms are run 100 times (each time a group of target users is randomly selected), the frequency histogram based on the recommendation accuracy index F1 on the evaluation data set is shown in fig. 4.
(3.f) calculation as active user uaRecommendation sub-algorithms A during recommendationxPrecision weight value W ofax(x is more than or equal to 1 and less than or equal to 4), and the specific calculation method comprises the following steps:
Figure BDA0002438068960000161
wherein u isaRepresenting active users currently enjoying the recommended service, Ax(1. ltoreq. x.ltoreq.4) represents a certain recommendation sub-algorithm, M represents a training list TopMaxNumber of addresses in set (M ═ 10), F1 (u)a,AxM) represents each recommendation sub-algorithm Ax(1. ltoreq. x. ltoreq.4) of the recommended precision.
And fourthly, selecting an information gain IG as a system stability evaluation index, comparing and evaluating the real score and pre-score information of the address in the subdata set, and evaluating the system stability of UBCF, SVD, SCF and KDE in a non-malicious attack scene. According to the information gain IG value of each recommended sub-model, the active user u is selectedaComputing a set of stability weight values GaThe method comprises the following implementation steps:
(4.a) compute evaluation Sub data set Sub2aThe specific calculation method of the information entropy in (1) is as follows:
Figure BDA0002438068960000162
wherein u isaRepresenting active users currently enjoying the recommendation service, Sub2aIndicates that user u is to be activeaThe accessed addresses are divided according to a certain proportion to obtain an evaluation subdata set, and preference is given to the subdata set PreferaRepresenting the evaluation of the Sub data set Sub2aAddress sets with median true scores greater than 0.05.
(4.b) compute Sub data set to be evaluated Sub2aAddress in (1) according to sub-algorithm Ax(x is more than or equal to 1 and less than or equal to 4) conditional entropy when the recommendation results are classified (classified into recommendation and non-recommendation), and the specific calculation method comprises the following steps:
Figure BDA0002438068960000163
wherein u isaRepresenting active users currently enjoying the recommended service, Ax(1. ltoreq. x.ltoreq.4) represents a certain recommended Sub-algorithm, Sub2aIndicates that user u is to be activeaThe accessed addresses are divided according to a certain proportion to obtain an evaluation subdata set and a training list TopMaxRepresenting the evaluation of the Sub data set Sub2aThe set of the top10 addresses with the highest medium pre-score, M represents the training list TopMaxNumber of addresses in set (M is 10), TPaxIs the number of addresses really liked by the user in the recommendation list, FNaxThe number of addresses really liked by the user who is not in the recommendation list (not recommended).
(4.c) compute evaluation Sub data set Sub2aAddress in (1) according to sub-algorithm AxThe specific calculation method of the information gain after classification of the recommendation result is as follows:
IG(ua,Ax,M)=D(ua)-T(ua,Ax,M) (21)
wherein u isaRepresenting active users currently enjoying the recommended service, Ax(1. ltoreq. x.ltoreq.4) represents a certain recommendation sub-algorithm, M represents a training list TopMaxNumber of addresses in set (M ═ 10), D (u)a) Representing the evaluation of the Sub data set Sub2aEntropy of (1), T (u)a,AxM) indicates that the Sub data set Sub2 is to be evaluatedaAddress in (1) according to sub-algorithm AxThe conditional entropy of the classification (classified into recommendation and non-recommendation) of the recommendation result of (1).
After running the four Sub-algorithms 100 times (randomly selecting a group of target users at a time), it is based on evaluating the Sub-data sets Sub2aThe frequency histogram of the above information gain IG index is shown in fig. 5.
(4.d) calculation as active user uaRecommendation sub-algorithms A during recommendationxSet of stability weights Gax(x is more than or equal to 1 and less than or equal to 4), and the specific calculation method comprises the following steps:
Figure BDA0002438068960000171
wherein u isaRepresenting active users currently enjoying the recommended service, Ax(1. ltoreq. x.ltoreq.4) represents a certain recommendation sub-algorithm, M represents a training list TopMaxNumber of addresses in set (M: 10), IG (u)a,AxM) represents each recommendation sub-algorithm Ax(x is more than or equal to 1 and less than or equal to 4).
Fifthly, comprehensively considering the robustness of the integrated recommendation system, balancing the relationship between the recommendation precision and the system stability, and providing active users u on the basis of two groups of weighting coefficientsaCalculating the final total weighting coefficient Ca. Pre-scoring the inaccessible addresses by each recommended submodel according to a total weighting coefficient CaAnd fusing to generate the final prediction scores of the integration model on the unaccessed addresses. And sorting all the inaccessible addresses according to the final prediction scores, and providing a recommendation list consisting of a plurality of addresses which are ranked at the top for the active users. The specific implementation steps are as follows:
(5.a) calculation as active user uaRecommendation sub-algorithms A during recommendationx(1. ltoreq. x. ltoreq.4) final weighting factor CaxThe specific calculation method comprises the following steps:
Figure BDA0002438068960000172
wherein, CaxPresentation recommendation sub-algorithm AxPre-scored final weight value, WaxDenoted as active user uaWhen recommending, the recommendation sub-algorithm AxPre-scored precision weight value, GaxPresentation recommendation sub-algorithm AxPre-scored stability weight values.
(5.b) for active user uaLocation i not visitedk(lk∈NewLa) And calculating final prediction scores for the positions by an integrated algorithm, wherein the specific calculation method comprises the following steps:
Figure BDA0002438068960000181
wherein, CaxIs an active user uaThe final weighting factor at the time of recommendation,
Figure BDA0002438068960000182
is the recommendation sub-algorithm AxFor active user uaLocation i not visitedkPre-scoring of (2).
(5, c) sorting all addresses which are not visited by the active user according to the final prediction score of the integration algorithm, forming a recommendation list by N positions which are ranked at the top, and enabling the recommendation list to be TopN L istaAnd returning to the active users (N can be a multiple of 5, and N is more than or equal to 5 and less than or equal to 50 under the normal condition).
And sixthly, evaluating the robustness of each recommendation system by using the precision index and the stability index, mainly comparing the comprehensive performance of the personalized position recommendation algorithm based on the ensemble learning and the integrated first four sub-algorithms, and evaluating the applicability and the effectiveness of the proposed technology. The method comprises the following implementation steps:
and (6.a) randomly selecting 123 users from the target data set as an active user set AU, and operating an integrated recommendation algorithm and four sub-algorithms for each active user in the set to generate a recommendation list.
And (6.b) evaluating the robustness of each recommendation system by using the Precision index and the stability index, wherein the values of Precision, Recall, recommendation Precision index F1 and information gain IG of each algorithm which runs for the active user set AU once are the average value of the indexes of all users in the AU set.
(6.c) repeat steps (6.a) and (6.b) 100 times, i.e. all algorithms run 100 times independently.
Box-shaped graphs of 100 accuracy rates Precision, Recall rate and recommendation accuracy index F1 generated by the integrated model in the process of 100 running are respectively shown in FIG. 6, FIG. 7 and FIG. 8.
And (6.d) setting the values of Precision, Recall, recommendation Precision index F1 and information gain IG of the integrated algorithm and the four sub-recommendation algorithms provided by the invention to be the average value of the results of 100 runs. When N takes different values, the accuracy Precision, Recall, recommendation Precision index F1, and information gain IG results of each recommendation algorithm are shown in tables 2, 3, 4, and 5, respectively:
TABLE 2 accuracy Precision index values for different recommendation algorithms
Figure BDA0002438068960000183
TABLE 3 Recall ratio Recall index values for different recommendation algorithms
Figure BDA0002438068960000184
Figure BDA0002438068960000191
TABLE 4 recommendation accuracy F1 index values for different recommendation algorithms
Figure BDA0002438068960000192
TABLE 5 information gain IG index values for different recommendation algorithms
Figure BDA0002438068960000193
In this case, a histogram of the integrated model compared with the recommendation accuracy index F1 for each submodel is shown in fig. 9, and a histogram of the information gain IG index compared with each other is shown in fig. 10.
(6.e) comparing and analyzing the results of each index: the Precision, Recall rate and recommendation Precision index F1 of the integrated algorithm are all larger than the corresponding index values of all the sub-recommendation algorithms, and the recommendation Precision of the integrated algorithm is higher than that of all the sub-algorithms; the information gain IG index of the integrated algorithm is larger than the maximum value in the information gain IG index of the sub-recommendation algorithm, which shows that the integrated algorithm is more stable than all sub-algorithms; the above two conclusions illustrate the robustness of the proposed technique.
The method is different from a conventional integrated algorithm, aims to construct a position recommendation system with strong expandability, high recommendation precision and stable recommendation result, considers the accuracy of the recommendation result and the diversity characteristics of data utilization and user behaviors, innovatively provides an evaluation mode using information gain as a system stability index, quantifies uncertainty caused by data source limitation (such as sparsity and cold start), different data preprocessing modes and model training, and improves the stability of the output result of the recommendation system. In addition, a set of weighting coefficients is customized for each user, and the integration algorithm can be ensured to be biased to different sub-algorithms for different users in a personalized weighting mode. The technology provided by the invention is beneficial to improving the robustness of the recommendation system, enhancing the service quality of the recommendation system, having wide application prospect and being expected to be widely applied to the social network market based on the position.
The above-described process flow is only a preferred embodiment of the present invention, but does not represent all the details of the present invention. Any modification, equivalent replacement, and improvement made by those skilled in the art within the technical scope of the present disclosure within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (8)

1.A personalized position recommendation method based on ensemble learning is characterized by comprising the following steps:
step 1, collecting and sorting an original user sign-in data set C, and converting the original user sign-in data set C into a user-position scoring matrix R;
step 2, selecting a social network based on the positionL active user u in BSNaAs a recommendation service object, selecting any type and any number of recommendation sub-algorithms A1,A2,…,AnDividing the addresses visited by the active user into a training subdata set and an evaluation subdata set, and calculating pre-scores for the addresses not visited by the active user and the addresses in the evaluation subdata set by using the visited information of the training subdata set of the active user;
step 3, selecting an evaluation index F1 of recommendation precision as a recommendation precision evaluation index, comparing the real score and pre-score information of the address in the evaluation sub-data set, evaluating the recommendation precision of each sub-model, and providing an active user u according to the recommendation precision index F1 value of each recommendation sub-modelaComputing a set of precision weight values Wa
Step 4, selecting information gain IG as a system stability evaluation index, comparing and evaluating the real score and pre-score information of the address in the sub data set, evaluating the system stability of each sub model in a non-malicious attack scene, and providing active users u with the information gain IG value of each recommended sub modelaComputing a set of stability weight values Ga
Step 5, based on two groups of weighting coefficients, the active users uaCalculating the final total weighting coefficient CaPre-scoring the inaccessible addresses by the recommended submodels by a total weighting factor CaMerging, generating a final prediction score of the integrated model for the inaccessible addresses, sequencing all the inaccessible addresses according to the final prediction score, and providing a recommendation list consisting of a plurality of addresses ranked at the top for active users;
and 6, comparing the comprehensive performance of the personalized position recommendation algorithm based on ensemble learning and each sub-algorithm before integration, and evaluating the applicability and effectiveness of the proposed technology.
2. The method for personalized position recommendation based on ensemble learning according to claim 1, wherein step 1 of the method comprises:
step 11, selecting a user check-in data set C of a target recommendation system, wherein the data set consists of historical check-in records of L addresses of U users, and extracting user ID, address ID, access time, address longitude and address latitude information from each check-in record;
step 12: converting each check-in record to a triplet (u)i,lj,nij) Wherein u isiIs the ith user (1 ≦ i ≦ U), ljIs the jth item (1. ltoreq. j. ltoreq. L), nijRepresenting user uiAccess address ljThe number of times of (c);
step 13: calculate all users at location ljTotal number of accesses NC _ auj
Step 14: computing user uiTotal number of visited locations N L Ci
Step 15: computing user uiTotal number of accesses to all locations NC _ ali
Step 16: calculating visited location ljAll users NUC ofj
And step 17: user uiAt address ljNumber of check-ins nijConversion to user uiFor address ljScore r ofijThe specific method comprises the following steps:
Figure FDA0002438068950000021
wherein r isijRepresenting user uiFor address ljScore of n, nijRepresenting user uiAt address ljNumber of check-ins, NC _ aujIndicating all users are at location ljL denotes the total number of addresses, N L CiRepresenting user uiTotal number of accessed positions, NC _ aliRepresenting user uiTotal number of visits to all locations, U representing total number of users, NUCjIndicating visited location ljThe number of all users of (c);
step 18: the user score is normalized, and the specific calculation method comprises the following steps:
Figure FDA0002438068950000022
wherein r isijRepresenting user uiFor address ljMin represents the lowest value of all scores in the user-location scoring matrix R, and max represents the highest value of all scores;
summing all scores to form a user-location score matrix R ═ Rij},i∈[1,U],j∈[1,L]。
3. The ensemble learning-based personalized position recommendation method according to claim 1, wherein step 2 of the method comprises:
step 21: obtaining a certain active user u of the current service of the recommendation systemaThe information of (a);
step 22: selecting a set of recommendation algorithms A according to application scenes and data characteristics1,A2,…,AnA sub-algorithm as an integration model;
step 23: according to the operation mechanism of each recommended sub-algorithm, model training is carried out on each algorithm to obtain each recommended sub-model M1,M2,…,Mn
Step 24: setting a uniform address division ratio p for all active users, and dividing the active users uaThe accessed addresses are divided into Sub data sets Sub1 according to the proportionaAnd Sub data set Sub2a
Step 25: using recommendation submodels M1,M2,…,MnAnd active user uaSub1aSubdata set information for the set of unaccessed addresses New LaAnd Sub data set Sub2aAddress l ink(lk∈NewLa∪Sub2a) Calculate a pre-score, as
Figure FDA0002438068950000023
4. The method for personalized position recommendation based on ensemble learning according to claim 1, wherein the step 3 comprises:
step 31: collecting each recommendation sub-algorithm A in the second stepx(x is more than or equal to 1 and less than or equal to n) is an active user uaCalculated pre-scoring information, Sub2aAll addresses in the set are sorted according to pre-scores, and the address of M before the rank is taken to generate a set TopMax
Step 32: collecting active users uaFor Sub2aTrue score of all addresses in Sub2aAddresses with a median true score greater than goodling are put into the set Prefera
Step 33: calculating each recommendation sub-algorithm AxThe accuracy Precision of Precision is calculated by the following specific method:
Figure FDA0002438068950000031
wherein u isaRepresenting active users currently enjoying the recommended service, Ax(x is more than or equal to 1 and less than or equal to n) represents a certain recommendation sub-algorithm (n is the number of the recommendation sub-algorithms), and a training list TopMaxRepresenting the evaluation of the Sub data set Sub2aThe set of the top M addresses with the highest pre-score, the preference sub data set PreferaRepresenting the evaluation of the Sub data set Sub2aThe middle real score is larger than the address set of goodling, M represents the training list TopMaxThe number of addresses in the set;
step 34: calculating each recommendation sub-algorithm AxThe Recall rate Recall comprises the following specific calculation method:
Figure FDA0002438068950000032
wherein u isaRepresenting active users currently enjoying the recommended service, Ax(x is more than or equal to 1 and less than or equal to n) represents a certain recommendation sub-algorithm (n is the number of the recommendation sub-algorithms), and a training list TopMaxRepresenting the evaluation of the Sub data set Sub2aSet of top M addresses with highest medium pre-scorePreference sub data set PreferaRepresenting the evaluation of the Sub data set Sub2aThe middle real score is larger than the address set of goodling, M represents the training list TopMaxThe number of addresses in the set;
step 35: calculating each recommendation sub-algorithm AxThe specific calculation method of the comprehensive accuracy index F1 is as follows:
Figure FDA0002438068950000033
wherein u isaRepresenting active users currently enjoying the recommended service, Ax(x is more than or equal to 1 and less than or equal to n) represents a certain recommendation sub-algorithm (n is the number of recommendation sub-algorithms), and M represents a training list TopMaxNumber of addresses in the set, precision (u)a,AxM) represents each recommendation sub-algorithm Ax(x is not less than 1 and not more than n) accuracy, called (u)a,AxM) represents each recommendation sub-algorithm Ax(1. ltoreq. x. ltoreq.n) recall;
step 36: calculated as active user uaRecommendation sub-algorithms A during recommendationxThe specific calculation method of the precision weight value set is as follows:
Figure FDA0002438068950000034
wherein u isaRepresenting active users currently enjoying the recommended service, Ax(x is more than or equal to 1 and less than or equal to n) represents a certain recommendation sub-algorithm (n is the number of recommendation sub-algorithms), and M represents a training list TopMaxNumber of addresses in set, F1 (u)a,AxM) represents each recommendation sub-algorithm Ax(x is more than or equal to 1 and less than or equal to n) is used.
5. The ensemble learning-based personalized position recommendation method according to claim 1, wherein step 4 of the method comprises:
step 41: compute Sub2aThe specific calculation method of the information entropy in (1) is as follows:
Figure FDA0002438068950000041
wherein u isaRepresenting active users currently enjoying the recommendation service, Sub2aIndicates that user u is to be activeaThe accessed addresses are divided according to a certain proportion to obtain an evaluation subdata set, and preference is given to the subdata set PreferaRepresenting the evaluation of the Sub data set Sub2aThe address set with the middle real score larger than goodraring;
step 42: compute Sub2aAddress in (1) according to sub-algorithm Ax(x is more than or equal to 1 and less than or equal to n) and classifying the recommendation results into recommendation and non-recommendation, wherein the specific calculation method comprises the following steps:
Figure FDA0002438068950000042
wherein u isaRepresenting active users currently enjoying the recommended service, Ax(1 ≦ x ≦ n) represents a recommendation Sub-algorithm (n is the number of recommendation Sub-algorithms), Sub2aIndicates that user u is to be activeaThe accessed addresses are divided according to a certain proportion to obtain an evaluation subdata set and a training list TopMaxRepresenting the evaluation of the Sub data set Sub2aThe set of the first M addresses with the highest pre-score, M represents the training list TopMaxNumber of addresses in the set, TPaxIs the number of addresses really liked by the user in the recommendation list, FNaxThe number of addresses really liked by the user who is not in the recommendation list (not recommended);
step 43: compute Sub2aAddress in (1) according to sub-algorithm AxThe specific calculation method of the information gain after classification of the recommendation result is as follows:
IG(ua,Ax,M)=D(ua)-T(ua,Ax,M) (9)
uarepresenting active users currently enjoying the recommended service, Ax(x is more than or equal to 1 and less than or equal to n) represents a certain recommendation sub-algorithm (n is the number of recommendation sub-algorithms), and M represents a training list TopMaxAddresses in a setNumber, D (u)a) Representing the evaluation of the Sub data set Sub2aEntropy of (1), T (u)a,AxM) indicates that the Sub data set Sub2 is to be evaluatedaAddress in (1) according to sub-algorithm AxConditional entropy when the recommendation results of (1) are classified (into recommended and not recommended);
step 44: calculated as active user uaRecommendation sub-algorithms A during recommendationxThe specific calculation method of the stability weight value set weighting coefficient is as follows:
Figure FDA0002438068950000051
wherein G isaxPresentation recommendation sub-algorithm AxPre-scored stability weight value, uaRepresenting active users currently enjoying the recommended service, Ax(x is more than or equal to 1 and less than or equal to n) represents a certain recommendation sub-algorithm (n is the number of recommendation sub-algorithms), and M represents a training list TopMaxNumber of addresses in the set, IG (u)a,AxM) represents each recommendation sub-algorithm Ax(x is more than or equal to 1 and less than or equal to n).
6. The ensemble learning-based personalized location recommendation method according to claim 1, wherein step 5 of the method comprises:
step 51: calculated as active user uaRecommendation sub-algorithms A during recommendationxAnd (x is more than or equal to 1 and less than or equal to n), wherein the specific calculation method comprises the following steps:
Figure FDA0002438068950000052
wherein, CaxIs an active user uaFinal weighting factor at recommendation, WaxDenoted as active user uaWhen recommending, the recommendation sub-algorithm AxPre-scored precision weight value, GaxPresentation recommendation sub-algorithm AxA pre-scored stability weight value;
step 52: for active user uaLocation i not visitedk(lk∈NewLa) And calculating final prediction scores for the positions by an integrated algorithm, wherein the specific calculation method comprises the following steps:
Figure FDA0002438068950000053
wherein, CaxIs an active user uaThe final weighting factor at the time of recommendation,
Figure FDA0002438068950000054
is the recommendation sub-algorithm AxFor active user uaLocation i not visitedkPre-scoring;
step 53, for set New LaAll the addresses in the list are sorted according to the final forecast score of the integration algorithm, the top N positions form a recommendation list, and the recommendation list is TopN L istaAnd returning to the active user.
7. The method for personalized position recommendation based on ensemble learning according to claim 1, wherein the step 6 comprises:
step 61, randomly selecting U × 10% of users from the target data set as an active user set AU, and operating each recommendation algorithm for each active user in the set to generate a recommendation list;
step 62: evaluating the robustness of each recommendation system by using the Precision index and the stability index, wherein the values of Precision indexes Precision, Recall, F1 and stability index IG of each algorithm which runs once for an active user set AU are the average value of the indexes of all users in the AU set;
and step 63: repeating the steps 61 and 62 Ntimes, namely independently running all algorithms for Ntimes;
step 64: setting the Precision, Recall, F1 and IG values of the integration algorithm and each sub-recommendation algorithm as the average value of the Ntimes running results;
step 65: and comparing and analyzing the results of all indexes: if the F1 value of the integration algorithm is larger than the F1 values of all the sub-recommendation algorithms, the recommendation precision of the integration algorithm is higher than that of all the sub-algorithms; if the IG index of the integrated algorithm is larger than the maximum value in the IG indexes of the sub-recommendation algorithms, the integrated algorithm is stable compared with all the sub-algorithms; if the two conclusions are established, the robustness of the integrated algorithm is stronger.
8. The personalized position recommendation method based on ensemble learning of claim 1, wherein the method divides addresses visited by an active user into a training sub-data set and an evaluation sub-data set according to a certain proportion, selects a plurality of recommendation sub-algorithms of any type, utilizes historical scoring information in the training sub-data set of the active user, calculates pre-scores of other addresses for the active user by each sub-algorithm, compares the historical scores and the pre-scoring information of the addresses in the evaluation sub-data set, carries out accuracy evaluation and stability evaluation on each sub-algorithm, generates personalized weighting coefficients for the active user according to evaluation results, combines the pre-scores of the non-visited addresses by each sub-algorithm by using the weighting coefficients, generates a final prediction score of the non-visited addresses of the active user by an integration model, and sorts the prediction scores of all the non-visited addresses, and selecting a plurality of addresses with the top ranking to recommend to the active users.
CN202010257793.0A 2020-04-03 2020-04-03 Personalized position recommendation method based on ensemble learning Active CN111475744B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010257793.0A CN111475744B (en) 2020-04-03 2020-04-03 Personalized position recommendation method based on ensemble learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010257793.0A CN111475744B (en) 2020-04-03 2020-04-03 Personalized position recommendation method based on ensemble learning

Publications (2)

Publication Number Publication Date
CN111475744A true CN111475744A (en) 2020-07-31
CN111475744B CN111475744B (en) 2022-06-14

Family

ID=71750449

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010257793.0A Active CN111475744B (en) 2020-04-03 2020-04-03 Personalized position recommendation method based on ensemble learning

Country Status (1)

Country Link
CN (1) CN111475744B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112036987A (en) * 2020-09-11 2020-12-04 杭州海康威视数字技术股份有限公司 Method and device for determining recommended commodities
CN114881689A (en) * 2022-04-26 2022-08-09 驰众信息技术(上海)有限公司 Building recommendation method and system based on matrix decomposition
CN115687801A (en) * 2022-09-27 2023-02-03 南京工业职业技术大学 Position recommendation method based on position timeliness characteristics and time perception dynamic similarity

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120229624A1 (en) * 2011-03-08 2012-09-13 Bank Of America Corporation Real-time image analysis for providing health related information
CN106776982A (en) * 2016-12-02 2017-05-31 深圳市唯特视科技有限公司 A kind of social media sentiment analysis method of use machine learning
CN107633444A (en) * 2017-08-29 2018-01-26 南京理工大学紫金学院 Commending system noise filtering methods based on comentropy and fuzzy C-means clustering
WO2019010379A1 (en) * 2017-07-07 2019-01-10 Dion Sullivan Dion System and method for evaluating the true reach of social media influencers
CN109241227A (en) * 2018-09-03 2019-01-18 四川佳联众合企业管理咨询有限公司 Space-time data based on stacking Ensemble Learning Algorithms predicts modeling method
CN109543109A (en) * 2018-11-27 2019-03-29 山东建筑大学 A kind of proposed algorithm of time of fusion window setting technique and score in predicting model

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120229624A1 (en) * 2011-03-08 2012-09-13 Bank Of America Corporation Real-time image analysis for providing health related information
CN106776982A (en) * 2016-12-02 2017-05-31 深圳市唯特视科技有限公司 A kind of social media sentiment analysis method of use machine learning
WO2019010379A1 (en) * 2017-07-07 2019-01-10 Dion Sullivan Dion System and method for evaluating the true reach of social media influencers
CN107633444A (en) * 2017-08-29 2018-01-26 南京理工大学紫金学院 Commending system noise filtering methods based on comentropy and fuzzy C-means clustering
CN109241227A (en) * 2018-09-03 2019-01-18 四川佳联众合企业管理咨询有限公司 Space-time data based on stacking Ensemble Learning Algorithms predicts modeling method
CN109543109A (en) * 2018-11-27 2019-03-29 山东建筑大学 A kind of proposed algorithm of time of fusion window setting technique and score in predicting model

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112036987A (en) * 2020-09-11 2020-12-04 杭州海康威视数字技术股份有限公司 Method and device for determining recommended commodities
CN114881689A (en) * 2022-04-26 2022-08-09 驰众信息技术(上海)有限公司 Building recommendation method and system based on matrix decomposition
CN115687801A (en) * 2022-09-27 2023-02-03 南京工业职业技术大学 Position recommendation method based on position timeliness characteristics and time perception dynamic similarity
CN115687801B (en) * 2022-09-27 2024-01-19 南京工业职业技术大学 Position recommendation method based on position aging characteristics and time perception dynamic similarity

Also Published As

Publication number Publication date
CN111475744B (en) 2022-06-14

Similar Documents

Publication Publication Date Title
Christensen et al. Social group recommendation in the tourism domain
Xu et al. A novel POI recommendation method based on trust relationship and spatial–temporal factors
Ghafouri et al. A survey on web service QoS prediction methods
Zhou et al. Privacy-preserving online task allocation in edge-computing-enabled massive crowdsensing
Sojahrood et al. A POI group recommendation method in location-based social networks based on user influence
Xu et al. Integrated collaborative filtering recommendation in social cyber-physical systems
Li et al. Next and next new POI recommendation via latent behavior pattern inference
CN111475744B (en) Personalized position recommendation method based on ensemble learning
Bok et al. Social group recommendation based on dynamic profiles and collaborative filtering
Eliyas et al. Recommendation systems: Content-based filtering vs collaborative filtering
Fan et al. Modeling temporal effectiveness for context-aware web services recommendation
CN114036376A (en) Time-aware self-adaptive interest point recommendation method based on K-means clustering
Cao et al. Multi-feature based event recommendation in event-based social network
CN114528480A (en) Time-sensing self-adaptive interest point recommendation method based on K-means clustering
Gu et al. Context aware matrix factorization for event recommendation in event-based social networks
Li et al. From reputation perspective: a hybrid matrix factorization for qos prediction in location‐aware mobile service recommendation system
Liao et al. GRBMC: An effective crowdsourcing recommendation for workers groups
Fu et al. Collaborative filtering recommendation algorithm towards intelligent community
KR20150122307A (en) Method and server apparatus for advertising
Kanaujia et al. A framework for development of recommender system for financial data analysis
Liu et al. VGMF: visual contents and geographical influence enhanced point‐of‐interest recommendation in location‐based social network
Meng et al. POI recommendation for occasional groups Based on hybrid graph neural networks
Wu et al. Service recommendation with high accuracy and diversity
Lu Personalized Recommendation Algorithm of Smart Tourism Based on Cross‐Media Big Data and Neural Network
Gao et al. [Retracted] Construction of Digital Marketing Recommendation Model Based on Random Forest Algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant