CN116702136A

CN116702136A - Manipulation attack method and device for personalized recommendation system

Info

Publication number: CN116702136A
Application number: CN202310973637.8A
Authority: CN
Inventors: 周潘; 罗志; 孙裕华; 徐子川; 袁增辉
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2023-08-04
Filing date: 2023-08-04
Publication date: 2023-09-05

Abstract

The invention relates to a manipulation attack method and a manipulation attack device for an X-armed bases-based personalized recommendation system in a big data environment. Comprising the following steps: discretizing the arm space; intercepting a recommendation result of a system, wherein the recommendation result is one arm selected from an arm space corresponding to a node after a learner determines the node in the HCT coverage tree selected in the round by utilizing an HCT algorithm; judging whether the arm space contains a target arm, if so, not attacking the wheel, otherwise, selecting other arms to replace the arm selected by the learner and submitting the arm to the environment, and enabling the environment to generate feedback to be received by the learner and the attacker. The invention can carry out manipulation attack on the personalized recommendation system in the angle of an attacker, and evaluate the vulnerability of the personalized recommendation system when the personalized recommendation system is subjected to manipulation attack according to the attack result.

Description

Manipulation attack method and device for personalized recommendation system

Technical Field

The invention relates to the technical field of data security, in particular to a manipulation attack method and device for an X-armed bases-based personalized recommendation system in a big data environment

Background

The core of the X-armed databases problem is how to conduct personalized recommendation for specific users in a continuous data space, and plays a vital role in personalized recommendation application in the fields of videos, internet of things services, advertisements and the like in a big data environment.

The X-armed bases is different from the traditional Multi-arm slots (MAB) problem, which the MAB algorithm solves is how to choose to maximize the benefit in the case of a limited number of arms, and feedback after each arm (arm) pull corresponds to an unknown probability distribution. The algorithm chooses to pull one arm and get feedback (reward) per round while gradually knowing its probability distribution, so to maximize benefit, the algorithm needs to compromise the benefit and get more information about the probability distribution of the pull arm feedback as each round chooses the pull arm. The difference between the X-armed candidates is that it solves the problem of maximization of benefits under the assumption that the number of arms is infinite, and because of this feature, the X-armed candidates is also applied to the fields of personalized recommendation systems (each selected object can be regarded as an arm, and feedback from the pull arm can be regarded as feedback obtained after recommending the object to the user) in situations where the number of selected objects such as big data is extremely huge or even nearly infinite, such as multimedia big data recommendation systems, job and job seeker recommendation systems in big data environments, and service recommendation systems in the service condition of the internet of things that are rapidly growing in the current network environment. The main idea of The work is to divide continuous space of infinite data continuously through a tree-shaped HCT algorithm (The High-Confidence Tree Algorithm, high confidence tree algorithm), and then to use a Monte Carlo method (Monte Carlo method), so that The efficiency of big data analysis is greatly improved.

Attacks on the bands algorithm mainly include two modes, namely a poisoning attack (data-poisoning attack) and a manipulation attack (action-manipulation attack). In an attack scenario, there are three interactors: learner (e.g., recommender system), attacker (e.g., user group), and Environment (e.g., environment). Wherein the attacker acts as an intermediate role between the learner and the environment, receives the arm selected by the learner, and returns feedback generated by the environment. For a poisoning attack, an attacker tampers with the feedback of the environment, and the learner is misled to reach the expected attack goal due to receiving feedback that does not coincide with the original. Manipulation attacks are more operational but more difficult than poison attacks because they do not directly act on the environmental feedback and tamper with it to an arbitrary value, but rather tamper with the arm chosen by the learner as another arm and then submit it to the environment. Another challenge is that the average benefit that an attacker corresponds to each arm is not known. Furthermore, there is also a limit to the attacker's launching of the attack due to objective conditions, in other words, the smaller the attack consumption, the more advantageous (e.g., less detectable by the system) the attacker is to achieve the same attack objective.

With respect to the research work in the current field, whether the attack is a manipulation attack or a poisoning attack, most of the attack targets are MAB algorithm, and the attack research of the X-armed bases algorithm is blank. Those skilled in the art are therefore unable to effectively investigate the vulnerability of the X-armed bases-based personalized recommendation system to manipulation attacks encountered in a big data environment.

Disclosure of Invention

Aiming at the technical problems in the prior art, the invention provides a method and a device for operating and attacking a personalized recommendation system based on X-armed bands under a big data environment.

The technical scheme for solving the technical problems is as follows:

in a first aspect, the present invention provides a manipulation attack method for an X-armed bases-based personalized recommendation system, comprising:

discretizing the arm space;

intercepting a recommendation result of a system, wherein the recommendation result is one arm selected from an arm space corresponding to a node after a learner determines the node in the HCT coverage tree selected in the round by utilizing an HCT algorithm;

judging whether the arm space contains a target arm, if so, not attacking the wheel, otherwise, selecting other arms to replace the arm selected by the learner and submitting the arm to the environment, and enabling the environment to generate feedback to be received by the learner and the attacker; the learner refers to a personalized recommendation system, and the environment refers to a user facing the personalized recommendation system.

Further, the discretizing the arm space includes:

dividing the arm space into M subspaces, wherein the value of M is M=2 ^X Wherein, the value of X is as follows:

t is the total round of HCT algorithm operation.

Further, the determining whether the arm space includes the target arm, if so, the present wheel does not attack, otherwise, other arms are selected to replace the arm selected by the learner and submitted to the environment, including:

s10, capturing the arm selected by the learner as x in the t th round _t The method comprises the steps of carrying out a first treatment on the surface of the If the arm space contains the target arm, the next round is entered, namely: t=t+1; otherwise, executing step S20;

s20, for each arm x (i), i E [1, M ], calculating an L value, namely:

，

wherein:the value of L corresponding to the ith arm of the t-th wheel; />Representation->Average value of middle round environmental feedback, +.>Representing a set of rounds up to the t-th round selection arm x (i); />Indicating the number of times arm x (i) was selected up to the t-th round; />Is a parameter with a value range of (0, 1);

s30, selecting one to enableArm x(s) with the smallest value, i.e.>Will x _t Tamper to x(s) and submit to the environment to obtain feedback r of the environment _t ；

S40, updating T _i (t)=T _i (t) +1 and；

s50, let t=t+1, and jump to step S10.

In a second aspect, the present invention provides a manipulation attack apparatus for an X-armed bases-based personalized recommendation system, comprising:

the arm space discretizing module is used for discretizing the arm space;

the result interception module is used for intercepting a recommendation result of the system, wherein the recommendation result is one arm selected from an arm space corresponding to a node after a learner determines the node in the HCT coverage tree selected by the learner by utilizing an HCT algorithm;

the replacing module is used for judging whether the arm space contains a target arm, if so, the round does not attack, otherwise, other arms are selected to replace the arm selected by the learner and are submitted to the environment, and the environment generates feedback and is received by the learner and the attacker; the learner refers to a personalized recommendation system, and the environment refers to a user facing the personalized recommendation system.

Further, the arm space discrete module is specifically configured to:

t is the total round of HCT algorithm operation.

Further, the replacing module is specifically configured to execute the following steps:

s20, for each arm x (i), i E [1, M ], calculating an L value, namely:

，

S40, updating T _i (t)=T _i (t) +1 and；

s50, let t=t+1, and jump to step S10.

In a third aspect, the present invention provides an electronic device comprising:

a memory for storing a computer software program;

and the processor is used for reading and executing the computer software program so as to realize the manipulation attack method for the personalized recommendation system based on the X-armed bases.

In a fourth aspect, the present invention provides a non-transitory computer readable storage medium, in which a computer software program for implementing a manipulation attack method for an X-armed based personalized recommendation system according to the first aspect of the present invention is stored.

The beneficial effects of the invention are as follows: with respect to the research work in the current field, whether the attack is a manipulation attack or a poisoning attack, most of the attack targets are MAB algorithm, and the attack research of the X-armed bases algorithm is blank. Meanwhile, the HCT algorithm, which is a typical algorithm of X-armed bases, is quite different from the typical MAB algorithm UCB (The Upper Confidence Bound Algorithm), because the former requires maintenance of a binary tree to discretize the huge arm space at run-time, while the latter does not. The goal of an attacker is to force the HCT algorithm to choose the node in the binary tree that contains the target arm under attack, rather than forcing the UCB algorithm to choose a particular arm as in the case of attacking UCB, resulting in the inability of existing poisoning and manipulation attacks against UCB algorithms to directly act on the HCT algorithm.

Drawings

FIG. 1 is a schematic diagram of a workflow of an Internet of things service recommendation system and an attack mode of an attacker;

FIG. 2 is a schematic diagram of a manipulation attack model;

FIG. 3 is a flowchart of a method for handling attacks on an X-armed bases-based personalized recommendation system in a big data environment according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a manipulation attack apparatus for a personalized recommendation system based on X-armed bases in a big data environment according to an embodiment of the present invention;

fig. 5 is a schematic diagram of an embodiment of an electronic device according to an embodiment of the present invention;

fig. 6 is a schematic diagram of an embodiment of a computer readable storage medium according to an embodiment of the present invention.

Detailed Description

The principles and features of the present invention are described below with reference to the drawings, the examples are illustrated for the purpose of illustrating the invention and are not to be construed as limiting the scope of the invention.

Parameter definition:

representing arm space, i.e. the collection of all arms

Total round of T, HCT algorithm operation

x _t Representing the arm selected by the learner at wheel t

Representing the arm submitted to the environment by the attacker at round t

r _t Representing feedback generated by the environment at the t-th round

K, representing target arm specified by attacker

M, representing the arm space of an attackerDivided into M subspaces

Represents the ith (i.e. [1, M)]) A number of subspaces, and these subspaces satisfy: />And->

x (i), for each subspaceFrom which an attacker randomly selects one arm x (i) to sample, and after the selection does not change any more

By the t-th round, the attacker selects the set of rounds of arm x (i), i.e. +.>

T _i (t) by the t-th round, the number of times the attacker selects arm x (i), i.e

，/>Average value of medium-turn environmental feedback, namely: />

In this example, we have proposed vulnerability studies under manipulation attacks against a typical X-armed bases algorithm HCT (The High-Confidence Tree Algorithm, high confidence tree algorithm).

Taking the internet of things recommendation system as an example for illustration, as shown in fig. 1, in the system, all internet of things services are mapped to a measurement space X (each internet of things service can be regarded as an arm), and are discretized through a hierarchical structure, namely a binary coverage tree. The X-armed bases algorithm uses the coverage tree to estimate the optimal Internet of things service, and outputs the recommended result to an Internet of things service provider, the provider pushes the specific service to a user according to the operation result of the algorithm, and then the user submits feedback to the algorithm. Based on the feedback, the algorithm will improve the recommendation results over the turn in an effort to recommend services that better meet the needs of a particular user. Meanwhile, an attacker can force the system to recommend certain specific content by hijacking recommendation results or feedback.

To deal with the huge arm spaceThus optimizing the yield, the HCT algorithm adopts a hierarchical structure to discretize the arm space. The hierarchical structure is a binary coverage tree +.>Wherein->The layering is +.>The node of (1) is denoted as (h, i), then the root node is (0, 1). The two child nodes of node (h, i) are (h+1, 2 i-1) and (h+1, 2 i). For each node (h, i), it covers arm space +.>Is a subset->And->The following three conditions are satisfied:

（1）

（2）

（3）。

at the same time, each node (h, i) randomly selects a representative armArm x will be chosen as long as node (h, i) is chosen _h,i Submitting to the environment.

At each round t, to decide which node to choose, the HCT algorithm calculates a confidence upper bound for each node (h, i), namely:

(1)

wherein the method comprises the steps of，/>，/>And->And->Two hyper-parameters for use by the learner. Definitions->Representing the set of rounds up to the t-th round, the HCT algorithm selects the node (h, i),(h _s ,i _s ) Indicating the node selected by the HCT algorithm at round s, -/->And->. In addition, the arm space is->Defining a difference function->It satisfies->Is->The method comprises the steps of carrying out a first treatment on the surface of the For a subset of arm space->The diameter is defined as->Then for each node (h, i) it is necessary to satisfy +.>。

To get a more stringent upper confidence bound, in addition to calculating the U value, the HCT algorithm also calculates a B value for each node, namely:

(2)

it can be seen that the larger the corresponding B value of the node, the greater the likelihood that it will contain the optimal arm, which is also the basis for HCT to select the node. At each round t, the HCT algorithm starts from the root node (0, 1), and selects one of the two child nodes with larger B value down to the leaf node or the node that establishes the following formula (note that the node selected by this round HCT algorithm is (h) _t ,i _t )）：

(3)

And then the arm is connectedSubmit to the environment and get feedback, and update +.>，/>And +.>Simultaneously updating the root node to (h) according to equation (2) _t ,i _t ) All nodes on the path of (a). Finally, the current node (h _t ,i _t ) Whether or not to hold the following formula:

(4)

if the above equation can be satisfied, the node (h _t ,i _t ) Is a two child node of (a), i.eAnd sets the U value corresponding to the two child nodes to +.>。

The steering attack model is shown in fig. 2. At round t, in a first step, the learner selects arm x _t Meanwhile, the system is intercepted by an attacker; second step, attacker will x _t Tampering by attack algorithmAnd submitted to the environment; third step, attacker and learner receive environmental pair +.>Feedback of->。

Based on the above, an embodiment of the present invention provides a manipulation attack method for a personalized recommendation system based on X-armed bases, as shown in fig. 3, including:

an attacker designates a target arm, constructs an attack algorithm LBT based on a binary coverage tree, and performs the following steps in each round of HCT algorithm operation:

step 1, a learner determines nodes in an HCT coverage tree selected in the round by utilizing an HCT algorithm, then selects one arm in an arm space corresponding to the nodes, recommends the arm to the environment, and an attacker intercepts a recommendation result;

and 2, the attacker judges whether the arm space corresponding to the node selected by the round of HCT algorithm contains a target arm, if so, the round of attack is not performed, otherwise, other arms are selected to replace the arm selected by the learner through the LBT algorithm and submitted to the environment, and the environment generates feedback and is received by the learner and the attacker.

Through the above steps, the attacker achieves manipulation of the HCT algorithm even though it selects the node containing the target arm as much as possible.

Before an attack, an attacker can make a pair with arm spaceDiscretizing, namely: dividing the arm space into M subspaces, wherein the value of M is M=2 ^X Wherein, the value of X is as follows:

t is the total round of HCT algorithm operation.

Specifically, the procedure performed by the attacker is as follows:

s20, for each arm x (i), i E [1, M ], calculating an L value, namely:

，

S40, updating T _i (t)=T _i (t) +1 and；

s50, let t=t+1, and jump to step S10.

In this embodiment, for the X-armed bases algorithm HCT implemented based on the overlay Tree, an attack algorithm named LBT (Lower Bound Tree) is proposed, and when the learner does not select an arm according to the manner desired by the attacker, the attacker falsifies the arm selected by the attacker into an arm with Lower average feedback through the LBT algorithm, so as to mislead the learner to force the learner to select a pull arm according to the manner desired by the attacker, so as to achieve the purpose of controlling the result selected by the learner (for example, in the multimedia recommendation system, a certain attacker wants to increase the click rate of the video authored by the attacker, and the attacker can force the recommendation system to continuously recommend the video designated by the attacker or the video similar to the attacker through the attack manner).

In this embodiment, assuming that the total round of operation of the HCT algorithm is T, an attacker can consume the HCT algorithm as attack through the LBT algorithmIn the case of (a) an attack is implemented, i.e. the round of the attacker attack is +.>Whereas the HCT algorithm has at least +.>Is run as desired by the attacker.

As shown in fig. 4, the embodiment of the present invention further provides a manipulation attack apparatus for a personalized recommendation system based on X-armed bases, including:

the arm space discretizing module is used for discretizing the arm space;

and the attack execution module is used for judging whether the arm space contains a target arm, if so, the round does not attack, otherwise, other arms are selected to replace the arm selected by the learner and are submitted to the environment, and the environment generates feedback and is received by the learner and the attacker.

Referring to fig. 5, fig. 5 is a schematic diagram of an embodiment of an electronic device according to an embodiment of the invention. As shown in fig. 5, an embodiment of the present invention provides an electronic device 500, including a memory 510, a processor 520, and a computer program 511 stored on the memory 510 and executable on the processor 520, wherein the processor 520 executes the computer program 511 to implement the following steps:

step 1, discretizing an arm space;

step 2, the learner determines the node in the HCT coverage tree selected in the round by utilizing the HCT algorithm, then selects one arm in the arm space corresponding to the node, recommends the arm to the environment, and an attacker intercepts the recommended result;

and 3, the attacker judges whether the arm space corresponding to the node selected by the round of HCT algorithm contains a target arm, if so, the round of attack is not performed, otherwise, other arms are selected to replace the arm selected by the learner through the LBT algorithm and submitted to the environment, and the environment generates feedback and is received by the learner and the attacker.

Referring to fig. 6, fig. 6 is a schematic diagram of an embodiment of a computer readable storage medium according to an embodiment of the invention. As shown in fig. 6, the present embodiment provides a computer-readable storage medium 600 having stored thereon a computer program 611, which computer program 611 when executed by a processor implements the steps of:

step 1, discretizing an arm space;

In the foregoing embodiments, the descriptions of the embodiments are focused on, and for those portions of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A manipulation attack method of a personalized recommendation system based on X-armed bases, comprising:

discretizing the arm space;

2. The method of claim 1, wherein discretizing the arm space comprises:

，

t is the total round of HCT algorithm operation.

3. The method of claim 2, wherein determining whether the arm space contains the target arm, if so, the round does not attack, otherwise selecting other arms to replace the arm selected by the learner and submitting to the environment, comprises:

s20, for each arm x (i), i E [1, M ], calculating an L value, namely:

，

S40, updating T _i (t)=T _i (t) +1 and；

s50, let t=t+1, and jump to step S10.

4. A manipulation attack apparatus of a personalized recommendation system based on X-armed bases, comprising:

the arm space discretizing module is used for discretizing the arm space;

5. The apparatus of claim 4, wherein the arm space discrete module is specifically configured to:

，

t is the total round of HCT algorithm operation.

6. The apparatus of claim 5, wherein the replacement module is specifically configured to perform the steps of:

s20, for each arm x (i), i E [1, M ], calculating an L value, namely:

，

S40, updating T _i (t)=T _i (t) +1 and；

s50, let t=t+1, and jump to step S10.

7. An electronic device, comprising:

a memory for storing a computer software program;

a processor for reading and executing the computer software program to implement a method of handling attacks on an X-armed bases based personalized recommendation system according to any one of claims 1-3.

8. A non-transitory computer readable storage medium, characterized in that the storage medium has stored therein a computer software program for implementing a method of handling attacks on an X-armed based personalized recommendation system according to any one of claims 1-3.