CN107689960B

CN107689960B - Attack detection method for unorganized malicious attack

Info

Publication number: CN107689960B
Application number: CN201710811240.3A
Authority: CN
Inventors: 周志华; 庞明; 高尉; 陶敏
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2017-09-11
Filing date: 2017-09-11
Publication date: 2021-01-01
Anticipated expiration: 2037-09-11
Also published as: CN107689960A

Abstract

The invention discloses a learning algorithm which can detect unorganized malicious attacks in a recommendation system so as to better ensure the quality of the recommendation system. The invention firstly provides and solves the attack detection task under the unorganized small-scale attack scene, formalizes the attack detection task into a matrix completion learning problem, and obtains a real scoring matrix X, a system noise matrix Z and a malicious attack deviation matrix Y corresponding to the scoring matrix M by utilizing the matrix completion algorithm provided by the invention. And detecting a malicious attacker in the user according to the information of the malicious attack deviation matrix Y.

Description

Attack detection method for unorganized malicious attack

Technical Field

The invention relates to machine learning and application technologies, in particular to collaborative filtering, a recommendation system and attack detection, and discloses a learning method capable of detecting unorganized malicious attacks in the recommendation system so as to better guarantee the quality of the recommendation system.

Background

The recommendation system is widely applied in our lives, and especially in the present day that people live on line more and more abundantly, the recommendation system plays more and more important influence on our lives. For example, more and more people buy goods on online shopping retail platforms such as Taobao, Amazon and the like, and watch videos on video websites such as Youkou, Aiqiyi and the like. In the face of the increasing number of users and the number of items, it is a great challenge how to recommend items suitable for each user. To address such challenges, many recommendations for collaborative filtering have been proposed.

The basic assumption of collaborative filtering is that users who originally exhibited similar hobbies should have similar hobbies later. There are two main categories of collaborative filtering, namely storage-based collaborative filtering methods and model-based collaborative filtering methods. The storage-based collaborative filtering method directly utilizes the scoring information given to the items by the user to predict the items in which the user is interested. Such methods are divided into two broad categories, namely user-based collaborative filtering methods and article-based collaborative filtering methods. The user-based collaborative filtering method comprises the steps that a similar user of a user is found firstly, and articles liked by the similar user are recommended to the user; the item-based collaborative filtering method recommends to a user similar items of items that the user likes. The collaborative filtering method based on the model firstly trains a prediction model by utilizing the scoring information of the articles given by the users, and then generates the recommendation for each user by utilizing the prediction model.

Both of the above two broad categories of collaborative filtering methods generally assume that the user's score for an item faithfully reflects the user's preferences. In real life, however, the owner of the article can control the recommendation system by counterfeiting a false user to give a false score, so that the interest of the owner of the article is increased. For example, an attacker may forge a fake user, resemble a normal user in scoring behavior, give a high score to his own item, or give a low score to his competitor's item. The existing research work shows that the recommendation method based on collaborative filtering is easily affected by malicious attacks.

In response to this problem, many methods of malicious attack detection have been proposed. The existing malicious attack detection method mainly comprises a statistical method, a clustering method and a classification method. Statistical methods find malicious attackers by detecting suspicious scores. The clustering method is used for clustering users into a plurality of clusters with similar performances according to the scoring information of the users, wherein the users in the smallest cluster are regarded as malicious attackers. The classification method comprises the steps of firstly extracting the characteristics of each user according to the grading information of each user on the articles, and then training according to the characteristics and the marks of the users (namely whether the articles are malicious attackers) to obtain a classification model for detecting the malicious attackers.

The application scenes of the existing malicious attack detection are all directed to organized large-scale attacks, but not unorganized small-scale attacks. An organized mass attack is, for example, where an owner of an item counterfeits hundreds of users according to the same strategy, where each counterfeited user would evaluate multiple items against normal users and give the own item a high score. However, the current recommendation system increases the cost of the attack through multiple mechanisms, so as to reduce the occurrence of malicious attacks, for example, the verification codes are widely used, more and more online platforms need short message verification or mailbox verification to complete registration, and a mechanism that a user can give evaluation only when purchasing an article, and the like. In this context, the high cost makes launching an organized large scale attack difficult to implement. However, unorganized small-scale attacks still exist widely, for example, merchants with competitive relationships on the online shopping retail platform may attack each other, for example, a plurality of users are forged to score high scores for their own articles, and score low scores for competitor's articles, and attack strategies adopted by different merchants are different. At this time, the existing attack detection method cannot effectively detect malicious attackers by detecting the same attack strategy, so that a great amount of attack scores still exist in the recommendation system, and the effect of the recommendation method is poor. Therefore, a learning algorithm that can detect unorganized malicious attacks is needed for recommendation systems.

Disclosure of Invention

The purpose of the invention is as follows: the existing attack detection method can only detect organized large-scale attacks, and under an unorganized small-scale attack scene, the existing algorithm cannot effectively detect the attack in the type by detecting the same malicious attack strategy. Aiming at the problems, the invention firstly provides and solves the attack detection task under the unorganized small-scale attack scene, formalizes the attack detection task into a learning problem of matrix completion and provides a corresponding attack detection learning algorithm. Specifically, the scoring information given to the item by the user consists of three parts. The first part is the real scoring of the article by the user, the second part is system noise which may appear when the user scores the article, and the third part is the malicious attack given to the article by the user. For example, in a scoring system with a scoring interval of 1 to 5 points, it is assumed that the real score of the user on the article is 4.8 points, if the user is a normal user, the final score may be 5 points, i.e., there is 0.2 points of system noise, and if the user is a malicious user, the final score may be 1 point, i.e., there is a malicious attack with a deviation of-4 points. Based on the setting that the preference of each user is affected by a small number of factors, we assume that the true score matrix composed of the first part is a low rank matrix. The second part is ubiquitous in scoring systems, but its value is small. The third part is a malicious attack part, and the number proportion of the malicious attacks compared with the normal scores is small, namely the third part is sparse and has a performance contrary to the real scores. The invention aims to restore the three parts of contents of the scoring information as much as possible according to the scoring information given to the article by the user and the properties of the three parts of information. Users who score the presence of a malicious attack component are considered malicious attackers.

The technical scheme is as follows: an attack detection method aiming at unorganized malicious attacks comprises the following steps:

step 1.1, converting all the scores of the users for the articles into an incomplete score matrix M, and determining parameters of an algorithm according to the number of the users, the number of the articles, the score number and the score interval of the score matrix M.

And 1.2, obtaining a corresponding real score matrix X, a system noise matrix Z and a malicious attack deviation matrix Y according to the score matrix M by using a matrix completion algorithm provided by us.

And 1.3, detecting a malicious attacker from the user according to the information of the malicious attack deviation matrix Y.

The determining of the parameters of the algorithm according to the information of the scoring matrix M specifically includes: determining the upper bound of a system noise matrix Z allowed by the algorithm according to the number of users, the number of articles and the number of scores of the score matrix M, and increasing the upper bound of the system noise matrix Z allowed by the algorithm with the increase of the number of users, the number of articles and the number of scores; and determining a lower bound of the deviation of the malicious score from the real score according to the score interval of the score matrix M, wherein the larger the score interval is, the larger the lower bound is.

According to the scoring matrix M, a real scoring matrix X, a system noise matrix Z and a malicious attack deviation matrix Y which correspond to the scoring matrix M are obtained by using a matrix completion algorithm, and the method specifically comprises the following steps: the scoring matrix M is composed of a true scoring matrix X, a system noise matrix Z, and a malicious attack bias matrix Y, so that M is X + Z + Y. According to common assumptions in recommendation systems, the preference of each user is determined by a small number of factors, so the true score matrix X is a low rank matrix. The system noise matrix Z is commonly present in the scoring system, i.e. its non-zero elements are many, but its value is generally small, for example, in the scoring system with a scoring interval of 1 to 5 points, it is assumed that the real score of the user on the item is 4.8 points, if it is a normal user, the final score may be 5 points, i.e. there is 0.2 points of system noise. Because the proportion of the number of the malicious attacks is small compared with that of the normal scores, namely, the malicious attack deviation matrix Y is a sparse matrix, and the non-zero entries are generally large. Another feature of the malicious attack deviation matrix Y is that its non-zero items perform contrary to the true score, that is, the items with high true score are maliciously degraded, and the items with low true score are maliciously promoted, and the product of the corresponding items with the value of X and Y is not greater than 0, for example, in a scoring system with a scoring interval of 1 to 5 points, it is assumed that the true score of the user on the items is 5 points, if the user is a malicious user, the final score may be 1 point, and there is a deviation of-4 points from the true score, that is, the high-score items are intentionally degraded.

Obtaining an optimization target according to the M ═ X + Z + Y and the properties of X, Z and Y,

wherein | · | purple_*Represents the kernel norm, | ·| non-woven phosphor of the matrix₁L representing a matrix₁Norm, | · | luminance_FA frobenius norm of the matrix is represented,<X，Y>representing the product sum of the corresponding elements of the matrices X and Y. The hyper-parameters τ, α and the degree of emphasis used to trade off each term in the optimization objective. Ω is the set of all the subscripts of the scoring item, e.g., (i, j) ∈ Ω indicates that user i has a scoring record for item j. P_ΩFor orthogonal projection, the meaning is as follows,

M_ijrepresenting the user i's score on item j.

The optimization target is solved by using an alternate optimization method, and theoretically, the method can well recover a real score matrix X, a system noise matrix Z and a malicious attack deviation matrix Y under a certain condition.

The method for detecting the malicious attacker from the user according to the information of the malicious attack deviation matrix Y specifically comprises the following steps: and each row of the malicious attack deviation matrix Y corresponds to the grading deviation information of one user, and if one row of the Y has non-zero elements, the user corresponding to the row is judged to be a malicious attacker.

Has the advantages that: compared with the prior attack detection technology, the method for obtaining the real score matrix X, the system noise matrix Z and the malicious attack deviation matrix Y corresponding to the score matrix M by utilizing the matrix-complemented variant algorithm can fully consider the characteristic that the score of the malicious attack is violated with the real score in the implementation process, is suitable for unorganized small-scale attack scenes, and can effectively detect unorganized small-scale attacks generated by different strategies.

Drawings

Fig. 1 is a flowchart of an attack detection method for an unorganized attack according to an embodiment of the present invention.

Detailed Description

The present invention is further illustrated by the following examples, which are intended to be purely exemplary and are not intended to limit the scope of the invention, as various equivalent modifications of the invention will occur to those skilled in the art upon reading the present disclosure and fall within the scope of the appended claims.

As shown in fig. 1, the attack detection method for the unorganized attack specifically includes the steps of:

The determining of the parameters of the algorithm according to the information of the scoring matrix M specifically includes: determining the upper bound of a system noise matrix Z allowed by the algorithm according to the number M of users, the number n of articles and the number d of scores of the scoring matrix M, and expressing the numerical value as

As the number of users, the number of items and the number of scores increase, the upper bound of the system noise matrix Z allowed by the algorithm also increases(ii) a And determining a lower bound of the deviation of the malicious score from the real score according to the score interval of the score matrix M, wherein the larger the score interval is, the larger the lower bound is.

According to the scoring matrix M, a real scoring matrix X, a system noise matrix Z and a malicious attack deviation matrix Y which correspond to the scoring matrix M are obtained by using a matrix completion algorithm, and the method specifically comprises the following steps: the scoring matrix M is composed of a true scoring matrix X, a system noise matrix Z, and a malicious attack bias matrix Y, so that M is X + Z + Y. We assume that the preference of each user is influenced by a small number of factors, and thus the true score matrix X is a low rank matrix. The system noise matrix Z is commonly present in the scoring system, i.e. its non-zero elements are many, but its value is generally small, for example, in the scoring system with a scoring interval of 1 to 5 points, it is assumed that the real score of the user on the item is 4.8 points, if it is a normal user, the final score may be 5 points, i.e. there is 0.2 points of system noise. Because the proportion of the number of the malicious attacks is small compared with that of the normal scores, namely, the malicious attack deviation matrix Y is a sparse matrix, and the non-zero entries are generally large. Another feature of the malicious attack deviation matrix Y is that its non-zero items perform contrary to the true score, for example, in a scoring system with a scoring interval of 1 to 5 points, it is assumed that the true score of the user on the item is 5 points, if the user is a malicious user, the final score may be 1 point, and there is a deviation of-4 points from the true score, i.e. the high-score item is intentionally scored low.

wherein | · | purple_*Represents the kernel norm, | ·| non-woven phosphor of the matrix₁L representing a matrix₁Norm, | · | luminance_FFlobenius norm representing matrixThe number, < X, Y >, represents the product sum of the corresponding elements of the matrix X and Y. And omega is a set of all the subscripts of the scoring items, and (i, j) epsilon omega represents that the user i gives a scoring record to the item j. P_ΩFor orthogonal projection, the meaning is as follows,

M_ijrepresenting the user i's score on item j.

The optimization target is solved by using an alternative optimization method, and theoretically, the method can well recover a real score matrix X, a system noise matrix Z and a malicious attack deviation matrix Y under a certain condition.

Claims

1. An attack detection method for unorganized malicious attacks is characterized by comprising the following steps:

step 1.1, converting all the scores of the users for the articles into an incomplete score matrix M, and determining parameters of an algorithm according to the number of the users, the number of the articles, the score number and the score interval of the score matrix M, wherein the parameters specifically comprise the following steps: determining the upper bound of a system noise matrix Z allowed by an algorithm according to the number of users, the number of articles and the number of scores of the score matrix M; determining the lower bound of the deviation of the malicious score from the real score according to the scoring interval of the scoring matrix M;

step 1.2, obtaining a corresponding real scoring matrix X, a system noise matrix Z and a malicious attack deviation matrix Y according to the scoring matrix M by using a matrix completion algorithm;

2. The attack detection method for the unorganized malicious attack as recited in claim 1, wherein the true score matrix X, the system noise matrix Z, and the malicious attack deviation matrix Y corresponding to the score matrix M are obtained by a matrix completion algorithm according to the score matrix M, and specifically: the scoring matrix M is composed of a true scoring matrix X, a system noise matrix Z, and a malicious attack bias matrix Y, so that M is X + Z + Y.

3. The attack detection method for unstructured malicious attacks according to claim 2, characterized in that an optimization objective is obtained according to M ═ X + Z + Y and the properties X, Z, Y have,

s.t.P_Ω(X+Z+Y)＝P_ΩM，

||P_Ω(Z)||_F≤

wherein | · | purple_*Represents the kernel norm, | ·| non-woven phosphor of the matrix₁L representing a matrix₁Norm, | · | luminance_FA frobenius norm of the matrix is represented,<X，Y>representing the product sum of corresponding elements of the matrices X and Y; the hyper-parameters tau, alpha and 6 are used for balancing the degree of emphasis of each item in the optimization target; Ω is the set of all scoring item subscripts, M_ijIndicating the rating, P, of user i on item j_ΩIs an orthogonal projection.

4. The attack detection method for an unstructured malicious attack as defined in claim 1, wherein the malicious attacker is detected from the user according to the information of the malicious attack bias matrix Y, specifically: and each row of the malicious attack deviation matrix Y corresponds to the grading deviation information of one user, and if one row of the Y has non-zero elements, the user corresponding to the row is judged to be a malicious attacker.