CN113343079A

CN113343079A - Attack detection robust recommendation method based on random forest and target item identification

Info

Publication number: CN113343079A
Application number: CN202110511665.9A
Authority: CN
Inventors: 伊华伟; 徐文倩; 冯晗; 李晓会
Original assignee: Liaoning University of Technology
Current assignee: Liaoning University of Technology
Priority date: 2021-05-11
Filing date: 2021-05-11
Publication date: 2021-09-03

Abstract

The invention discloses an attack detection robust recommendation method based on random forest and target item identification, which comprises the following steps of S1: extracting effective characteristics capable of distinguishing normal users from attack users from scoring data based on a chi-square statistic theory; s2: training a random forest classifier based on the effective features extracted in the step S1, and performing first-stage detection on a user set to be detected by using the trained random forest classifier to obtain a first-stage user profile detection result; s3: identifying the initial attack profile class obtained in the step S2 through target item identification to realize attack profile detection in the second stage; s4: and constructing a robust recommendation algorithm according to the attack profile detection result to realize robust recommendation of attack detection. Compared with the existing robust recommendation algorithm, the algorithm provided by the invention improves the robustness of the algorithm on the premise of guaranteeing the recommendation precision, so that the recommendation result of the collaborative filtering recommendation system is more accurate.

Description

Attack detection robust recommendation method based on random forest and target item identification

Technical Field

The invention relates to the technical field of personalized recommendation by utilizing a computer technology, in particular to an attack detection robust recommendation method based on random forest and target item identification.

Background

The collaborative filtering recommendation system is used as an important component in the field of electronic commerce, and can actively provide personalized recommendation service for users. In order to obtain preference data of a user, the recommendation system has an open nature to the user. However, some merchants use the open nature of the system to inject fake scoring data into the system and achieve personal goals by changing the recommendation results of the system, and this behavior with malicious intent is called "trusting attack", also called recommendation attack. The trust attack interferes the recommendation process of the system, so that the recommendation result generates deviation, and dissatisfaction of users and merchants is easily caused. Therefore, how to make the recommendation system have the anti-attack capability and ensure the accuracy of the recommendation result becomes a problem to be solved urgently.

Aiming at the problems provided above, based on the machine learning theory, people provide some corresponding robust recommendation algorithms from both supervised and unsupervised aspects.

From the perspective of a supervision method, Williams et al extract 13 features for an attacking user, and on the basis, detect and identify the attacking profile by using SVM, KNN and C4.5 methods. Wushion et al extracts effective challenge detection indexes and classifies users by using naive Bayes classification and k-nearest neighbor classification algorithm. The lie waves et al use item popularity to extract features for different users, and based on the features proposed, an improved ID3 algorithm is used to propose an attack profile detection algorithm based on popularity. Zhou et al propose an SVM-TIA based attack profile detection algorithm in order to alleviate the class imbalance problem in classification. Zhou et al feature-extracted Aop attacks using text features TF-IDF and proposed an SVM-based attack profile detection algorithm. Hao et al propose an automatic feature extraction method and an Adaboost-based detection method to solve the problem of classification imbalance.

From the perspective of an unsupervised method, Zhang et al propose an unsupervised trusting attack detection method based on hidden markov model and hierarchical clustering. Clever et al propose an LFAMR model to resolve the potential factors of score loss. The Zhouqiang et al extracts general features of the attacking user by using the information entropy, covers the user by using a bionic pattern recognition technology, and judges the user outside the coverage as the attacking user. Mobasher et al propose two recommendation algorithms, one is based on a k-means recommendation algorithm, and the other is based on a probabilistic latent semantic analysis recommendation algorithm, and the robustness of the method is remarkably improved in the face of attack compared with the traditional k nearest neighbor method. The method comprises the steps of firstly introducing a risk factor concept to floods and the like, calculating a risk value of a user rating behavior, then calculating classification weight of the risk factor by using information entropy, and finally providing a multi-dimensional risk factor attack profile detection method. Zhang et al propose an unsupervised attack detection algorithm based on user scoring behavior.

The existing robust recommendation algorithms still have some defects, firstly, the real profile is easily judged as the attack profile by mistake, so that the accuracy of the algorithm is damaged; the second is that the improvement of algorithm robustness is at the cost of loss of accuracy.

Disclosure of Invention

Aiming at the existing problems, the invention aims to provide a robust recommendation method based on random forest and target item identification, which has stronger robustness on the premise that a collaborative filtering recommendation system guarantees recommendation precision.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

the attack detection robust recommendation method based on random forest and target item identification is characterized by comprising the following steps,

s1: extracting effective characteristics capable of distinguishing normal users from attack users from scoring data based on a chi-square statistic theory;

s2: training a random forest classifier based on the effective features extracted in the step S1, and performing first-stage detection on a user set to be detected by using the trained random forest classifier to obtain a first-stage user profile detection result;

s3: identifying the initial attack profile class obtained in the step S2 through target item identification to realize attack profile detection in the second stage;

s4: and constructing a robust recommendation algorithm according to the attack profile detection result to realize robust recommendation of attack detection.

Further, the specific operation of step S1 includes the following steps,

s101: let U be { U ═ U₁，u₂，...，u_mDenotes the set of all users, I ═ I₁，i₂，...，i_nRepresents the set of all items, for item I belongs to I, user U belongs to U, if

The item i is rated to be scored once, and the scoring times of the item i by all users in the U are counted, namely

S102: calculating the item popularity T ═ { IPop for all items₁，IPop₂，...，IPop_nSorting the n items according to the popularity descending order;

s103: according to the sorting result, dividing n items into two sets by adopting a 10-fold cross validation method, wherein one set is a popular item set I_POPUA collection of non-popular items I_UNPOPU；

S104: taking the item and the user as two statistics, the value ranges of which are respectively { popular item, unpopular item } and { user scored, user not scored }, calculating the degree of association between the unpopular item and the user u, namely the chi-square value of the unpopular item,

in the formula (I);

indicating that an item belongs to the collection I_UNPOPUAnd is scored by the user u by the number,

indicating that an item does not belong to the collection I_UNPOPUAnd the number scored by the user u,

indicating that an item belongs to the collection I_UNPOPUAnd user u has not scored the number of,

indicating that an item does not belong to the collection I_UNPOPUThe number of the users u not scoring is larger, and N represents the number of all items;

s105: combining the non-popular item card value characteristics detected in the step S104 with WDMA, RDMA, WDA, Length Variance, DegSim', FMV, FAC, FMD and PV13 detection characteristics to form a characteristic matrix V of a user characteristic vector, which is used as an effective characteristic for distinguishing normal users from attack users.

Further, the specific operation of step S2 includes the following steps,

s201: dividing original user scoring data into two parts according to a proportion, wherein one part is used as a training set for training a random forest classifier, and the other part is used as a user set to be detected;

s202: respectively calculating the feature matrixes of the training set and the user set to be detected according to the feature matrix V extracted in the step S1;

s203: constructing a random forest classifier by using training set data and training;

s204: and detecting the user set to be detected by using a trained random forest classifier, outputting a classification prediction result to obtain a user profile detection result in a first stage, and preliminarily dividing the user profile detection result into an initial real profile class and an initial attack profile class.

Further, the specific operation of step S203 includes the following steps,

s2031: assuming that a data set of a training set contains t samples, randomly selecting k subsets from the data set by using a Bootsrap resampling technology, and respectively training k decision trees, wherein each sample in the training subsets contains m attributes;

s2032: when each node of the decision tree needs to be split, randomly selecting s attributes (s < m) from m attributes, selecting one attribute from the s attributes as the split attribute of the node, and repeatedly executing the division process until the stop condition is met;

s2033: respectively training k Bootsrap sample sets according to the mode in the step S2032 to k decision tree models, and finally combining all generated decision trees into a random forest classifier { T }_i i＝1，2，…，k}。

Further, the specific operation of step S3 includes the following steps,

s301: calculating a grading mean value corresponding to the items in the initial attack profile class, and confirming the item with the largest mean value as a target item;

s302: and sequentially checking the users in the user set containing the initial attack profile, finding out all the users with the highest scores for the target item, identifying the users as the final attack profile, and finishing the detection of the second stage.

6. The attack detection robust recommendation method based on random forest and target item identification as claimed in claim 5, wherein the specific operation of step S4 comprises the following steps,

s401: based on the initial real profile class obtained in the step S2 and the final attack profile class obtained in the step S3, initializing a feature matrix by using a PSO method to obtain an initial user feature matrix and an item feature matrix;

s402: constructing an indication function I according to the detection result of the final attack profile in the step S3_S(u)，

In the formula, S is the ultimate attackA user set corresponding to the profile, wherein U is an overall user set;

s403: will indicate the function I_S(u) and item feature vector q_iAre combined to obtain q_i←q_i+I_S(u)γ(p_ue_ui-λq_i)；

S404: using the formula p_u←p_u+γ(q_ie_ui-λp_u) And q is_i←q_i+I_S(u)γ(p_ue_ui-λq_i) Iteratively updating the initial user characteristic matrix and the project characteristic matrix until the algorithm converges to obtain an optimal user characteristic matrix and an optimal project characteristic matrix;

s405: and generating a recommendation aiming at the target user according to the optimal user characteristic matrix and the item characteristic matrix.

The invention has the beneficial effects that:

the invention provides a robust recommendation method for fusing random forest and target item identification, which comprises the steps of firstly, utilizing a random forest classifier obtained by training to carry out first-stage attack profile detection on a user profile, then identifying a target item to finish second-stage attack profile detection on the user profile to obtain a final attack profile detection result, combining the attack profile detection result with a matrix decomposition model, and providing a robust recommendation algorithm RRA-RFTII which is compared with the existing matrix decomposition method (MMF) based on M-estimator, the matrix decomposition method (MMF) based on minimum truncation two-times estimator and the robust recommendation algorithm (KMR-M) based on incremental clustering and matrix decomposition, wherein the algorithm provided by the invention is superior in recommendation precision and robustness, and obtains an initial characteristic matrix through particle swarm optimization technology, the capability of obtaining the optimal solution by model training is improved, so that the recommendation precision of the algorithm is guaranteed, and the recommendation result of the collaborative filtering recommendation system is more accurate.

Drawings

FIG. 1 is a block diagram of a robust recommendation method of the present invention;

FIG. 2 is a block flow diagram of steps S1-S3 of the robust recommendation method of the present invention;

FIG. 3 is a comparison result of the accuracy of four attack detection algorithms in the first embodiment of the present invention;

FIG. 4 is a comparison of recall ratios of four attack detection algorithms according to a first embodiment of the present invention;

FIG. 5 shows the MAE values of four proposed algorithms according to one embodiment of the present invention;

FIG. 6 shows the PS values of four proposed algorithms according to one embodiment of the present invention.

Detailed Description

In order to make those skilled in the art better understand the technical solution of the present invention, the following further describes the technical solution of the present invention with reference to the drawings and the embodiments.

As shown in fig. 1, the attack detection robust recommendation method based on random forest and target item identification comprises the following steps,

specifically, S101: let U be { U ═ U₁，u₂，...，u_mDenotes the set of all users, I ═ I₁，i₂，...，i_nRepresents the set of all items, for item I belongs to I, user U belongs to U, if

s103: according to the sorting result, dividing n items into two sets by adopting a 10-fold cross validation method, wherein one set is a popular item set I_POPUA collection of non-popular items I_UNPOPU(ii) a Evaluation of popular items by normal usersThe grading number is large, the grading number of non-popular projects is small, the grading of the attack user on the projects is random, and the grading number of popular projects is not greatly different from that of the non-popular projects.

in the formula (I);

the larger the CSUI value is, the larger the correlation degree of the user u with the non-popular items is, and the more times of scoring the non-popular items by the user u are indicated; the smaller the CSUI value is, the smaller the correlation degree of the user u with the non-popular project is, and the smaller the number of times of scoring the non-popular project by the user u is indicated; the CSUI value is 0, so that the user u is independent from the non-popular items, and the number of times of scoring the non-popular items by the user u is 0; because the normal user is more inclined to score popular items, and the attack user scores randomly selected items, the number of times that the attack user scores non-popular items is greater than that of times that the normal user scores non-popular items, and the CSUI value of the normal user is smaller than that of the attack user;

s105: combining the chi-square value characteristics of the non-popular items detected in the step S104 with the feedback recorder systems in the prior art: detection of profile attributes [ J ], (Chad A W, Bamshad M, Robin B.D; Service organized Computing and Applications,2007,1(3): 157-: a classification-based attack [ C ]// Proceedings of the 8th Knowledge Discovery on the Web International Conference on Advances in Web Mining and Web Usage Analysis (LLIAMS C A, Mobasher B, Burke R, et al; Berlin: Springer,2007.167-186.) proposed WDMA, RDMA, WDA, Length Variance, DegSim', FMV (mean attack model), FAC (random attack model), FAC (popular attack model), FMD (random attack model), FMD (popular attack model), FMD (mean attack model), and PV (mean attack model), constituting a feature matrix V of user feature vectors as an effective feature for distinguishing normal users from attacking users.

The specific algorithm in step S1 is named algorithm 1 (fusion feature extraction algorithm FFEA), and then algorithm 1 specifically is:

inputting: a user-item scoring matrix R, a user set U and an item set I;

and (3) outputting: a feature matrix V.

In the algorithm, lines 1-8 calculate the popularity of each item, sort the items in a descending order according to the popularity of the items, and divide all the items into a popular item set I_POPUAnd a set of non-popular items I_UNPOPU(ii) a Lines 9-14 utilize the 13 features mentioned in the prior art in combinationCalculating characteristics of each user u by a CSUI calculation formula; and the 15 th row returns a feature matrix V consisting of all the user feature vectors.

Further, step S2: training a random forest classifier based on the effective features extracted in the step S1, and performing first-stage detection on a user set to be detected by using the trained random forest classifier to obtain a first-stage user profile detection result;

specifically, S201: dividing original user scoring data into two parts according to a proportion, wherein one part is used as a training set for training a random forest classifier, and the other part is used as a user set to be detected;

s203: constructing a random forest classifier by using the test set data and training;

the specific operation steps of constructing the random forest classifier and specifically training are as follows,

s2032: when each node of the decision tree needs to be split, randomly selecting s attributes (s < < m) from m attributes, selecting one attribute from the s attributes as the split attribute of the node, and repeatedly executing the division process until the stop condition is met;

The algorithm in step S2 is named algorithm 2 (attack profile detection algorithm APDA _ RF), and then algorithm 2 specifically is:

inputting: training set Train, to be detected user set Test;

and (3) outputting: user class label set T_result.

In the algorithm 2, the 1 st row to the 3 rd row calculate the feature matrix of the training set and the test set according to the 14 features provided by the algorithm 1; training a random forest classifier in the 4 th row; and in lines 5-6, performing predictive classification on the training set characteristic matrix by using the trained classifier to obtain T_resultFinally returning the predicted result T_result。

Further, step S3: identifying the initial attack profile class detected in the step S2 through target item identification to realize attack profile detection in the second stage;

user profiles can be preliminarily classified into two types by an attack profile detection algorithm (algorithm 2) based on a random forest, but some normal users may be mistakenly detected as attack users in the detection process. In order to make the Detection result more accurate, an Attack Profile Detection Algorithm (APDA _ TII) Based on Target Item Identification is further proposed. In particular, the method comprises the following steps of,

The algorithm in step S3 is named algorithm 3 (attack profile detection algorithm APDA _ TII), and then algorithm 3 specifically is:

inputting: initial attack Profile class T_susItem set I_susHighest, highestScoring max;

and (3) outputting: attack user Profile set C_attack.

Lines 1-7 in the algorithm 3 are determined target projects, the mean value of each project is calculated, and the target project with the largest mean value is calculated; in the 8 th-13 th line, finding out attack users and removing the attack users from the initial attack profile class; line 14 returns the attacking user Profile set C_attack。

Further, step S4: and constructing a robust recommendation algorithm according to the attack profile detection result to realize robust recommendation of attack detection.

In order to guarantee the recommendation accuracy of the algorithm, a feature matrix initialization algorithm (IFM _ PSO) based on PSO is adopted to improve the capability of model training to obtain an optimal solution, and a robust recommendation algorithm RRA-RFTII is constructed by combining the attack profile detection result obtained in the step S3.

The method comprises the steps of expressing the scoring of a user on an item into a linear model, assuming that a plurality of implicit classification features exist, expressing the scoring of the user on a certain item into a linear combination of the degree of the item belonging to each implicit classification feature and the preference degree of the user on each implicit classification feature, and specifically expressing the linear model into a formula

Representing a matrix of n x m prediction scores,

is a user feature matrix of f x m, vector p_u(u ═ 1, 2.. times, m) denotes that user u is implicitly scored for eachThe degree of preference of the class;

is f x n item feature matrix, vector q_i(

i

1, 2.., n) represents the extent to which item i belongs to each implicit classification; solving a least squares problem by gradient descent

Q can be obtained_iAnd p_uThereby a user feature matrix and an item feature matrix.

Specifically, S401: based on the initial real profile class obtained in the step S2 and the final attack profile class obtained in the step S3, initializing a feature matrix by using a PSO method to obtain an initial user feature matrix and an item feature matrix;

In the formula, S is an attack user set, and U is a whole user set;

s403: will indicate the function I_S(u) and the term feature vector q in step S102_iAre combined to obtain q_i←q_i+I_S(u)γ(p_ue_ui-λ_qi)；

S404: to the initialized item feature vector q_iAnd a user feature vector p_uPerforming an iterative update, p_u←p_u+γ(q_ie_ui-λp_u)，q_i←q_i+γ(p_ue_ui-λq_i) To obtain the predicted score of the user u for the item i

In the formula, λ (| q)_i||²+||p_u||²) Is a regularization term added to avoid overfitting, λ is a constant,

representing the difference between the true score and the predicted score, r_uiThe user u truly scores the item i, and gamma represents the change step length of gradient descent; until the algorithm is converged, obtaining an optimal user characteristic matrix and an optimal project characteristic matrix;

The algorithm in step S4 is named algorithm 4 (robust recommendation algorithm RRA-RFTII), and then the algorithm 4 specifically is:

inputting: the user matrix R to be detected, the training set Train, the Test set Test and the attack user profile C_attackThe number of users m, the number of items n, the number of particles t and the number of implicit classification features f;

and (3) outputting: a user characteristic matrix P and an item characteristic matrix Q.

In the algorithm 4, the rows 1-2 are used for acquiring an initial user characteristic matrix and an item characteristic matrix; part 2 is rows 3-21, for feature vector p_uAnd q is_iAnd carrying out iterative updating until the algorithm is converged to obtain the optimal user characteristic matrix and the optimal project characteristic matrix.

The first embodiment is as follows:

in this embodiment, 1M scoring information data in the MovieLens movie recommendation system is used, and the data set includes 1000209 pieces of scoring information of 3952 movies by 6040 users. The scoring value ranges from an integer of 1 to 5, with a larger value indicating a greater preference of the user for the movie being scored.

An attack profile is generated by adopting an average attack (AverageAttack) model, a popular attack (PopulaAttack) model, a random attack (RandomAttack) model and an Aop attack (AopAttack) model, and different filling scales are set, wherein the filling scales of the random attack, the average attack and the popular attack are respectively { 1%, 3%, 5%, 10%, 25%, 50% }, and the filling scales of Aop attack are respectively { 1%, 3%, 5%, 10% }.

Table 1 below shows the setup protocol of the experimental data of this example, which includes 1 training set and 7 testing sets. The training set is used for training a random forest classifier and comprises 600 real users, wherein the number of random attack users, mean attack users and popular attack users is 120 respectively, and the number of 20% Aop attack users, 30% Aop attack users and 40% Aop attack users is 80 respectively. The 7 sets of test set data were used to test the performance of the attack detection algorithm and the robustness of the recommendation algorithm. Group 1 contains 500 real users and 60 random attack users. Group 2 contains 500 real users and 60 mean attack users. Group 3 contains 500 real users and 60 popular attack users. Group 4 contains 500 real users and 60 20% Aop attacking users. Group 5 contains 500 real users and 60 30% Aop attack users. Group 6 contains 500 real users and 60 40% Aop attack users. Group 7 contains 500 real users and 150 mixed attack users (hybrid attach).

TABLE 1 Experimental data

In the present embodiment, Mean Absolute Error (MAE) and Prediction bias (PS) are used to evaluate the recommendation accuracy and robustness of the algorithm.

The lower the MAE value is, the better the accuracy of the algorithm is, and the MAE calculation formula is

In the formula, r_uiRepresenting the user u's true rating of item i,

and N is the predicted scoring times of the user u on the item i.

The PS is the average value of the change values of the prediction scores of the target item before and after the target item is attacked by the user, the smaller the value is, the stronger the anti-attack capability of the algorithm is represented, and the calculation formula of the PS is as follows

In the formula (I), the compound is shown in the specification,

and

respectively representing the prediction scores of the user u on the target item before and after the attack, and N representing the total prediction times.

The accuracy (Precision) and the Recall (Recall) are used for evaluating the Touchi attack detection performance of the algorithm, and the calculation formula is as follows:

where TP represents the number of attack profiles that are correctly detected, FP represents the number of true profiles that are misjudged, and FN represents the number of attack profiles that are not detected.

In order to evaluate the performance of the attack detection algorithm (TS _ APDA) proposed in the present invention, it was experimentally compared with the existing 3 attack detection algorithms.

(1) SVM _ APDA, model for random attacks (random average attacks) and random average of attacks (FMD) models proposed in the references of defense recipient systems of detection of profile attacks [ J ] (Chad A W, Bamshad M, Robin B.D; Service interested Computing and Applications,2007,1(3):157-170.) and detection profile information in a colloidal borne filtering of a classification-based attack [ C ]// Proceedings of the 8th Knowledge Discovery on the Web Internationality communication in Web Mining and Using Analysis (FMs C A, Mobasher B, Burr, et al; Berlin: Spring, 2007.167-186), FMD model of random average of attacks (FMD), model for random average of attacks (FMD) of attacks (FMD 13) and model for random average of attacks (RDMA) of attacks, random average of attacks (RDMA) of attacks, random average of attacks of random average of attacks (RDMA) of random average of attacks of random average of random attacks of attack of RDMA, RDMA of random attack (RDMA) of random attack of random attack of RDMA, random attack of random attack, and training the SVM classifier to carry out attack detection on the user profile.

(2) KNN _ APDA selects the random attack models mentioned in the literature, the detection of the random attack models, the random attack, the user profile is classified using a KNN classifier.

(3) C4.5_ APDA, the random attack models mentioned in the references of the detection of the attack on the probability distribution systems [ J ] (Chad A W, Bamshad M, Robin B.D; Service organized Computing and Applications,2007,1(3): 157) and detection of the attack on the probability distribution in the probability distribution Analysis on the probability distribution Analysis in the Web Analysis and Usage (I), the random attack models mentioned in the references of the probability distribution systems [ C ]// Procedents of the 8 Kknowledged attack on the probability distribution in the Web Analysis and Analysis (FMD), the random attack models mentioned in the probability distribution systems [ C ], the random attack models of the attack on the probability distribution models [ C ]/(R, B, Burr, et ], the random attack models of the probability distribution models [ M ] (FMD), the random attack models of the probability distribution models [ C ], (R, M) (FMD), the random attack models of the random attack on the probability distribution models of the probability distribution models (FMMA, 2007.167-186), the random attack models of the random attack on the probability distribution models (RDMA), the random attack models of the random attack on the probability distribution models (FMD), the random attack models of the random attack on the random attack, the user profile is classified using a decision tree C4.5 classifier.

In order to evaluate the recommendation accuracy and robustness of the recommendation algorithm RRA-RFTII provided by the invention, the following experiment comparison is carried out by comparing the existing recommendation algorithm.

(1) M-estimator-based matrix decomposition methods proposed in the MMF literature (Mehta B, Hofmann T, and Nejdl W.robust colloidal filtering [ C ]. Proceedings of the 2007ACM reference on Recommander Systems, Recsys, Minneapolis, MN, USA,2007: 49-56.).

(2) LTSMF-a matrix decomposition method based on a minimum truncated two-times estimator proposed in the literature (Cheng Z, hurley N. robust colloidal reactive matrix factorization [ C ]. Proceedings of the IEEE 201022 and International Conference on Tools with engineering Intelligence (ICTAI), ARras, France,2010: 105-.

(3) KMCQR-M robust recommendation algorithm based on incremental clustering and matrix factorization proposed in literature (Xu Yu-chen, Liu Zhen, Zhang Fu-zhi. robust recommendation on incremental clustering and matrix factorization [ J ]. Journal of Chinese Computer Systems,2015,36(04): 689-.

The comparative results are as follows:

(1) accuracy versus recall

The attack detection algorithms TS _ APDA, SVM _ APDA, KNN _ APDA and C4.5_ APDA detect attack profiles over 7 test sets with accuracy and recall as shown in fig. 3 and 4.

As can be seen from FIG. 3, for different types of attacks, the detection accuracy of the TS _ APDA algorithm is close to 1; the accuracy rate of the SVM _ APDA algorithm on detection of random attacks, mean value attacks, popular attacks and mixed attacks is between 0.5 and 0.65, and detection on Aop attacks is invalid; the accuracy rate of the KNN _ APDA algorithm for detecting random attack, mean attack, popular attack and mixed attack is between 0.4 and 0.52, and the detection for Aop attack is invalid; the accuracy of the detection of various attacks by the C4.5_ APDA algorithm is between 0.02 and 0.26. Generally speaking, the detection accuracy of the TS _ APDA algorithm in 7 groups of test sets is obviously higher than that of the other three algorithms, the main reason is that in the first-stage detection process, besides 13 detection features mentioned in the prior art, chi-square value detection features of non-popular projects proposed based on chi-square statistics are adopted, and the extracted features are used for training a random forest classifier, so that the precision of the classifier is improved to a certain extent; meanwhile, aiming at the detection result of the first stage, the second stage detection is carried out on the category containing the attack profile through the identification of the target item, so that the detection result is more accurate.

As can be seen from fig. 4, in the detection of random attack, mean attack, and popular attack, the recall rate of the TS _ APDA detection algorithm reaches 1, and the recall rates of the other three detection algorithms are also close to 1. In the detection of Aop attack and mixed attack, the recall rate of the TS _ APDA detection algorithm reaches over 0.7 and 0.8 respectively, and in the detection of Aop attack and mixed attack, the recall rate of the other three detection algorithms is obviously reduced, particularly the recall rate of the SVM _ APDA detection algorithm and the KNN _ APDA detection algorithm is 0 in the detection of Aop attack. In general, of the four detection algorithms, the TS _ APDA detection algorithm is most capable of identifying attack profiles.

(2) MAE and PS comparison

The four recommended algorithms RRA-RFTII, MMF, LTSMF and RRA-RFTII MAE and PS on the 7 test sets are shown in FIGS. 5 and 6.

As can be seen from FIG. 5, under different types of attacks, the MAE values of the MMF, LTSMF and KMCQR-M algorithms are all above 0.7, while the MAE value of the RRA-RFTII algorithm is close to 0.7 only under the mixed attack. The smaller the MAE value is, the higher the recommendation accuracy of the algorithm is, so that the recommendation accuracy of the RRA-RFTII algorithm provided by the invention is the best among the four algorithms. The main reason is that before gradient descent is carried out on the RRA-RFTII algorithm, the initial characteristic matrix is not generated randomly, but the initial user characteristic matrix and the project characteristic matrix are obtained by adopting a particle swarm optimization technology, so that the capability of obtaining the optimal solution by model training is improved, and the recommendation precision of the algorithm is guaranteed.

As can be seen from FIG. 6, under random attack, mean attack and epidemic attack, the PS values of the RRA-RFTII algorithm provided by the invention are smaller, while the PS values of the other three algorithms are larger. Under Aop attack and mixed attack, the attack resistance of the four algorithms is reduced, but the PS value of the RRA-RFTII algorithm is still lower than that of the other three algorithms. Since the smaller the PS value, the better the robustness of the algorithm, the robustness of the RRA-RFTII algorithm is the best among the four algorithms. The main reason is that the RRA-RFTII algorithm adopts a two-stage attack profile detection algorithm, and can effectively detect the attack profile before recommendation.

The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. The attack detection robust recommendation method based on random forest and target item identification is characterized by comprising the following steps,

2. The attack detection robust recommendation method based on random forest and target item identification as claimed in claim 1, wherein the specific operation of step S1 comprises the following steps,

S102: calculating item popularity for all items

Sorting the n items in descending order according to popularity;

(ii) a In the formula (I);

s105: combining the non-popular item card value characteristics detected in the step S104 with WDMA, RDMA, WDA, LENGTTHVAriance, DegSim', FMV, FAC, FMD and PV13 detection characteristics to form a characteristic matrix V of a user characteristic vector, which is used as an effective characteristic for distinguishing normal users from attack users.

3. The attack detection robust recommendation method based on random forest and target item identification as claimed in claim 2, wherein the specific operation of step S2 comprises the following steps,

4. The attack detection robust recommendation method based on random forest and target item identification as claimed in claim 3, wherein the specific operation of step S203 comprises the following steps,

5. The attack detection robust recommendation method based on random forest and target item identification as claimed in claim 3, wherein the specific operation of step S3 comprises the following steps,

In the formula, S is a user set corresponding to the final attack profile, and U is a whole user set;