CN106874427B

CN106874427B - Item association-based trust attack detection method

Info

Publication number: CN106874427B
Application number: CN201710057846.2A
Authority: CN
Inventors: 李巧巧; 陈百基
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2017-01-23
Filing date: 2017-01-23
Publication date: 2020-01-14
Anticipated expiration: 2037-01-23
Also published as: CN106874427A

Abstract

The invention discloses a method for detecting a trusting attack based on project association, which comprises the following steps: carrying out project correlation calculation on the mixed user scoring matrix R to obtain a project correlation matrix A of the mixed user scoring matrix R; searching a neighbor user of a certain target user in the mixed user scoring matrix R, removing the target user and the neighbor user thereof, and obtaining a new mixed user scoring matrix R; repeatedly searching a target user; carrying out project correlation calculation on the new mixed user scoring matrix r to obtain a project correlation matrix a of the new mixed user scoring matrix r; calculating Euclidean distances between the item correlation matrix A and different item correlation matrices a; and accumulating the Euclidean distance to each user and the adjacent users thereof in the mixed user scoring matrix R, and finally filtering the attacking users. According to the invention, the difference values are obtained by calculating the correlation value matrix, and the difference values are sequenced, so that the purpose of filtering attack users is finally achieved, the detection accuracy is improved, and the defects of the existing detection method are overcome.

Description

Item association-based trust attack detection method

Technical Field

The invention relates to the field of machine learning, in particular to a trusting attack detection method based on project association.

Background

With the development of the internet, the amount of information on the network increases dramatically, and it is difficult for people to quickly locate target content from massive information, and at the same time, the utilization rate of the information is reduced. Therefore, how to implement high-quality recommendation for users in the case of "information overload" becomes a research focus. The collaborative filtering recommendation algorithm is one of the most widely applied recommendation algorithms due to the efficient and convenient personalized recommendation technology. The method and the system analyze the existing information of the user, and search similar users for the target user from individual points such as preference and demand of the user, so that commodities which are closer to the taste of the target user are recommended.

And due to the characteristics of convenience and openness, the safety of the collaborative filtering recommendation system is challenged. In order to improve personal interests, illegal merchants can add users who score good commodities or users who score bad commodities to the recommendation system, so that the recommendation result contains recommendations of abnormal items, and own interests are achieved. Such an attack for achieving an illegal purpose by adding a fake user profile is called a trusted attack, and a random attack, an average attack and the like are common.

In practical applications, the collaborative filtering recommendation system presents a great vulnerability in the face of such attacks. Therefore, it is becoming a popular research topic to research the safety of the collaborative filtering recommendation system. At present, the detection aiming at the trusting attack is mainly to remove the attacking user before recommending and generating by analyzing the characteristics of the credit values and the like of the real user and the fake user. From the perspective of machine learning, many detection methods can be classified into three major categories, namely, supervised learning detection methods, unsupervised learning detection methods, and semi-supervised learning detection methods, according to their detection modes. The supervised detection method mainly extracts features aiming at each user profile, and after marking, the detection is realized by classifiers such as a support vector machine and the like. The method for extracting features has better performance only by requiring larger filling scale of the attack, has not ideal effect when the profile of an attacking user is filled to be smaller, and needs a large amount of priori knowledge for learning. Besides using the features of the user profile such as the score length and the score change, the learner combines the signal recognition technique with the detection method, but the method is only effective for the noise data, and the detection accuracy is not ideal. Then, researchers have proposed unsupervised learning methods, for example, a detection method based on principal component analysis, since the scoring patterns of the attacking users are similar, the similarity between the attacking users is high and the similarity between the real users is low, the covariance between the fake users obtained by converting into covariance is low, the covariance between the fake users and the real users is low, the covariance between the real users is high, and the fake users are filtered out after calculation by extracting the principal components, but the number of the attacking users needs to be known in the method, so that the number of the filtering users is set, the influence of the number setting on the detection result is very large, and when the filling scale of the users is increased, the detection effect of the method is reduced accordingly. When only a small number of labeled users exist, the effect of learning depending on supervision is not ideal, so a semi-supervised learning method appears, the method mainly comprises two parts, a small number of labeled data are trained by using a classifier, unlabelled data are added into the classifier in an iterative mode, but the method needs continuous iteration for adjustment, and besides time consumption, the difficulty is increased for detection by determining the iteration times and the like. Meanwhile, the existing inspection methods are all based on the similarity between users, the user score value and other angles, the relation between the scoring items is not directly considered, and if an attacker is based on the angle, the existing inspection methods lose the inspection effect to a certain extent.

Disclosure of Invention

In order to overcome the defects and shortcomings in the prior art, the invention provides a project association-based trust attack detection method, which obtains difference values by calculating an association value matrix and sorts the difference values, so that the purpose of filtering attack users is finally achieved, the detection accuracy is improved, and the defects of the existing detection method are overcome.

In order to solve the technical problems, the invention provides the following technical scheme: a trusting attack detection method based on project association comprises the following steps:

s1, carrying out project correlation calculation on the mixed user scoring matrix R to obtain a project correlation matrix A of the mixed user scoring matrix R;

s2, searching a neighbor user of a certain target user in the mixed user scoring matrix R, and removing the target user and the neighbor user to obtain a new mixed user scoring matrix R; repeatedly searching the target user until all the users in the mixed user scoring matrix R are searched, and obtaining a plurality of new mixed user scoring matrices R;

s3, carrying out project correlation calculation on the new mixed user scoring matrix r to obtain a project correlation matrix a of the new mixed user scoring matrix r;

s4, calculating Euclidean distances between the project correlation matrix A and different project correlation matrices a;

and S5, accumulating the Euclidean distance to each user and the adjacent users in the mixed user scoring matrix R, and filtering the attacking users.

Further, the step S1 is specifically:

s11, converting the mixed user scoring matrix R into a new matrix R ', wherein the scoring items in the new matrix R' are 1, and the non-scoring items are 0;

s12, expanding each item in the new matrix R ', and adding the expanded items and the new matrix R' to obtain a common bisection item of the items;

s13, calculating similarity of the common score items by utilizing the Pearson similarity, wherein the formula is as follows:

wherein, in the item X,

is the mean of item X; y is an item Y which is a group of,is the mean of item Y;

s14, repeating the steps S12-S13 until the project correlation matrix A is generated.

Further, the step S2 is specifically:

s21, setting the number k of neighbor users in the mixed user scoring matrix R;

s22, calculating k neighboring users for each user by using the KNN method, specifically as follows:

s221, calculating the distances between the other users and the target user according to a formula;

wherein x is_iScore target user, y_iScoring the neighbor users, wherein m is the column number of the original mixed scoring matrix and represents the scoring number;

s222, sequencing the calculated distances, and finding out k nearest neighbor users to the target user;

and S23, removing the target user and the neighbor users thereof to obtain a new mixed user scoring matrix r.

Further, the step S3 is specifically:

s31, converting the new mixed user scoring matrix r into a new matrix r ', wherein the scoring items in the new matrix r' are 1, and the non-scoring items are 0;

s32, expanding each item in the new matrix r ', and adding the expanded items and the new matrix r' to obtain a common bisection item of the items;

s33, calculating similarity of the common score items by utilizing the Pearson similarity, wherein the formula is as follows:

wherein, in the item X,

is the mean of item X; y is an item Y which is a group of,

is the mean of item Y;

s34, repeating the steps S32-S33 until the project correlation matrix a is generated.

Further, the filtering attack user in step S5 specifically includes: and sorting the corresponding distances of the users, wherein the sorting adopts ascending processing, and filtering the first N users with the shortest distances, wherein the first N users with the shortest distances represent filtering attack users.

After the technical scheme is adopted, the invention at least has the following beneficial effects:

(1) the invention provides a new detection method from the perspective of project relevance, and makes up the defects of the existing detection method;

(2) as most of the existing attack methods randomly select items for scoring, the method has wider applicability;

(3) the method has a good detection effect on common attacks, and simultaneously improves the detection effect of AOP attacks which are difficult to defend by the existing detection method;

(4) compared with learning methods such as semi-supervision and supervision, the method provided by the invention does not need a large amount of prior knowledge, and can directly learn the mixed matrix, thereby achieving the purpose of filtering attack users.

Drawings

FIG. 1 is a flowchart illustrating steps of a method for detecting a trust attack based on project association according to the present invention;

FIG. 2 is a schematic diagram of the detection result of the random attack according to the present invention and other two detection methods;

FIG. 3 is a schematic diagram of the detection results of FIG. 2 for mean value attack, showing the accuracy of each detection algorithm when the score of the attacking user increases;

FIG. 4 shows the detection result of the AOP attack according to the present invention.

Detailed Description

It should be noted that, in the present application, the embodiments and features of the embodiments may be combined with each other without conflict, and the present application is further described in detail with reference to the drawings and specific embodiments.

The invention relates to a method for detecting a trusting attack based on project association, which comprises the following steps as shown in figure 1:

1. and (3) calculating the relevance of each item in the mixing matrix by using the common item scores of the items: the collected original matrix is a data set only with real users, and the mixed scoring matrix is obtained by adding fake users to the data set of the real users through the existing attack method; for example, some attack methods are generated by a random attack method, and after the attack methods are added to an original matrix, an existing data set is called as a mixed matrix;

1) converting the mixed user scoring matrix R (including real users and attack users) into a new matrix R ', wherein the scoring items in the R' are 1, and the non-scoring items are 0;

2) expanding each item in the R ', and adding the expanded items with the R' to obtain a common average item of the items; in matlab software, expansion means that each item has only one column, and the item is repeated to generate a matrix as large as R' after expansion, so that the two matrices are added and represent the common average item of the item and other items when the sum is 2;

3) calculating similarity by using a Pearson similarity for the common scoring items;

4) repeating the steps 2-3 until a project correlation matrix is generated, and assuming that the size of the original mixed scoring matrix is m x n and the size of the project correlation matrix is n x n;

2. searching a target user to obtain a neighbor user, removing the target user and the neighbor user:

1) setting the number k of adjacent users;

2) in the mixed matrix, k neighbor users are obtained for each user by KNN:

A. calculating the distances between the other users and the target user according to a formula;

B. b, sequencing the sequence obtained in the step A, and finding out k nearest neighbor users which are nearest to the target user;

3) removing the target user and the neighbor users thereof to obtain a new mixing matrix r;

3. calculating a correlation matrix difference value:

1) calculating a project correlation matrix for the result r of the step 2-3, wherein the calculation method is the same as that of the step 1;

2) calculating the mahalanobis distance between the R and the item correlation matrix of different R;

4. filtering the first N attack users:

1) and accumulating the result obtained in the step 3-2 to the related user and the neighbor users thereof.

2) And sorting (ascending) the distances corresponding to the users, and filtering the first N users with the shortest distances.

The detection result of the proposed new detection method for random attacks is shown in fig. 2, and compared with the PCA and SVM detection methods, the new detection method is not affected by attack scale (attack size) and filling scale (filler size), and the optimal performance is achieved under the same data set;

the detection result of the new detection method for the epidemic attack is shown in fig. 3, and the new detection method has optimal performance;

the detection result of the proposed new detection method for the mean-based epidemic attack is shown in fig. 4, and when the attack scale is increased, the accuracy of PCA detection is obviously reduced; since the attack also considers the correlation between commodities, the new detection method is also suitable for the attack and has better performance.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that various equivalent changes, modifications, substitutions and alterations can be made herein without departing from the principles and spirit of the invention, the scope of which is defined by the appended claims and their equivalents.

Claims

1. A trusting attack detection method based on item association is characterized by comprising the following steps:

s1, carrying out project correlation calculation on the mixed user scoring matrix R to obtain a project correlation matrix A of the mixed user scoring matrix R; the method comprises the following specific steps:

wherein, X is an item X,

s14, repeating the steps S12-S13 until a project correlation matrix A is generated;

s3, carrying out project correlation calculation on the new mixed user scoring matrix r to obtain a project correlation matrix a of the new mixed user scoring matrix r; the method comprises the following specific steps:

wherein, X is an item X,is the mean of item X; y is an item Y which is a group of,

is the mean of item Y;

s34, repeating the steps S32-S33 until a project correlation matrix a is generated;

2. The item association-based trusting attack detection method according to claim 1, wherein said step S2 specifically is:

s21, setting the number k of neighbor users in the mixed user scoring matrix R;

3. The item association-based trusting attack detection method according to claim 1, wherein the filtering attack user in step S5 specifically is: and sorting the corresponding distances of the users, wherein the sorting adopts ascending processing, and filtering the first N users with the shortest distances, wherein the first N users with the shortest distances represent filtering attack users.