CN116595528A

CN116595528A - Method and device for poisoning attack on personalized recommendation system

Info

Publication number: CN116595528A
Application number: CN202310880108.3A
Authority: CN
Inventors: 周潘; 罗志; 孙裕华; 徐子川; 袁增辉
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2023-07-18
Filing date: 2023-07-18
Publication date: 2023-08-15

Abstract

The invention relates to a poisoning attack method and device for an X-armed bases-based personalized recommendation system in a big data environment. An attacker acquires a recommendation result recommended to the environment by a learner, and the HCT tree of the learner is reproduced according to the recommendation result; then, the attacker acquires the feedback result of the environment, tamper the feedback result based on the repeated HCT tree, and return the feedback result to the learner. The invention can carry out the poisoning attack on the personalized recommendation system in the angle of an attacker, and evaluate the vulnerability of the personalized recommendation system when the personalized recommendation system is subjected to the poisoning attack according to the attack result.

Description

Method and device for poisoning attack on personalized recommendation system

Technical Field

The invention relates to the technical field of data security, in particular to a method and a device for poisoning and attacking a personalized recommendation system based on X-armed bases in a big data environment.

Background

The core of the X-armed databases problem is how to conduct personalized recommendation for specific users in a continuous data space, and plays a vital role in personalized recommendation application in the fields of videos, internet of things services, advertisements and the like in a big data environment.

The X-armed bases is different from the traditional Multi-arm slots (MAB) problem, which the MAB algorithm solves is how to choose to maximize the benefit in the case of a limited number of arms, and feedback after each arm (arm) pull corresponds to an unknown probability distribution. The algorithm chooses to pull one arm and get feedback (reward) per round while gradually knowing its probability distribution, so to maximize benefit, the algorithm needs to compromise the benefit and get more information about the probability distribution of the pull arm feedback as each round chooses the pull arm. The difference between the X-armed candidates is that it solves the problem of maximization of benefits under the assumption that the number of arms is infinite, and because of this feature, the X-armed candidates is also applied to the fields of personalized recommendation systems (each selected object can be regarded as an arm, and feedback from the pull arm can be regarded as feedback obtained after recommending the object to the user) in situations where the number of selected objects such as big data is extremely huge or even nearly infinite, such as multimedia big data recommendation systems, job and job seeker recommendation systems in big data environments, and service recommendation systems in the service condition of the internet of things that are rapidly growing in the current network environment. The main idea of The work is to divide continuous space of infinite data continuously through a tree-shaped HCT algorithm (The High-Confidence Tree Algorithm, high confidence tree algorithm), and then to use a Monte Carlo method (Monte Carlo method), so that The efficiency of big data analysis is greatly improved.

Attacks on the bands algorithm mainly include two modes, namely a poisoning attack (data-poisoning attack) and a manipulation attack (action-manipulation attack). In an attack scenario, there are three interactors: learner (e.g., recommender system), attacker (e.g., user group), and Environment (e.g., environment). Wherein the attacker acts as an intermediate role between the learner and the environment, receives the arm selected by the learner, and returns feedback generated by the environment. For the manipulation attack, an attacker attacks by tampering with the recommended result and submits the tampered arm to the environment, and the learner is misled due to receiving feedback generated by the arm which does not coincide with the recommended result, so that the attacker achieves the attack purpose. For a poisoning attack, an attacker falsifies the feedback of the environment, and a learner is misled to achieve the expected attack target due to receiving feedback which is not consistent with the original, and the poisoning attack directly acts on the feedback received by the learner, so that the attack mode is more direct and efficient compared with the manipulation attack mode for falsifying the recommended result. Furthermore, there is also a limit to the attacker's launching of the attack due to objective conditions, in other words, the less the attack consumption is to the attacker's advantage (e.g., less detectable by the system) if the same attack objective is reached.

With respect to the research work in the current field, whether the attack is a manipulation attack or a poisoning attack, most of the attack targets are MAB algorithm, and the attack research of the X-armed bases algorithm is blank. Those skilled in the art are therefore unable to effectively study the vulnerability of the X-armed bases-based personalized recommendation system to poisoning attacks encountered in a big data environment.

Disclosure of Invention

The invention aims at the technical problems in the prior art, and provides a method and a device for poisoning an X-armed bases-based personalized recommendation system under a big data environment.

The technical scheme for solving the technical problems is as follows:

in a first aspect, the present invention provides a method of poisoning an X-armed bands-based personalized recommendation system, comprising:

s100, acquiring a recommendation result recommended to the environment by a learner, and reproducing an HCT tree of the learner according to the recommendation result; the learner refers to a personalized recommendation system, and the environment refers to a user facing the personalized recommendation system;

s200, obtaining feedback results of the environment, tampering the feedback results based on the repeated HCT tree, and returning the feedback results to the learner.

Further, the reproducing the HCT tree of the learner according to the recommendation result includes:

s110, defining a flag variable、/>And node variable +.>The node variable->For pointing to a node in the recurring HCT tree; and remembers the node selected by the t-th round learner from the HCT tree as +.>The arm selected is->T is a positive integer;

s120, at the t-th wheel, putJudging the current->Whether the value of (2) is 1:

if it isConsider->Nodes and +.>Corresponding, and->Juxtaposing->Is->After that, step S200 is performed;

if it isStep S130 is performed;

s130, judgingWhether the value of (2) is 0:

if it isTraversing all nodes in the recurrent HCT tree to find whether the node +.>Corresponding arm->If present, consider node->And->Correspondingly, juxtapose->Step S200 is executed, if the HCT leaf node does not exist, the repeated HCT leaf node is expanded;

if it isStep S200 is performed.

Further, the expanding recurrent HCT leaf child node includes:

s140, searching for the covered arm space containing the armLeaf node->Expanding two sub-nodes, and judging which of the two sub-nodes covers an arm space containing an arm +.>If the arm space covered by one of the child nodes contains arm +.>The child node is considered to be +.>Arms corresponding to the child node and +.>Same and let node variable->Pointing to another child node and then setting +>And performs step S200.

Further, the obtaining the feedback result of the environment, tampering the feedback result based on the repeated HCT tree, and returning the feedback result to the learner includes:

s210, obtaining feedback results of the environment, and lettingAnd update->；

S220, ifCalculate +.>；

wherein ,

m represents the round of ensuring that the attack can be effective, T represents the total round of HCT algorithm operation, and +.>；

；

The method comprises the steps of carrying out a first treatment on the surface of the N is->Or->；

If it isPut->；

S230, the probability of successful tampering isUnder the condition of->And returns it to the learner; if the tampering fails, then->；

S240, orderAnd jumps to step S120;

representing the learner select node +.>Is>Representation->Average value of the original feedback of the environment in the round, +.>Representing the learner select node +.>K represents the target arm specified by the attacker, ++>Representing node->Covered arm space, +.>Represents the HCT tree reproduced up to round t,>is a confidence parameter in the range (0, 1).

In a second aspect, the present invention provides a poisoning attack apparatus for an X-armed bases-based personalized recommendation system, comprising:

the HCT tree reproduction module is used for obtaining a recommended result recommended to the environment by the learner and reproducing the HCT tree of the learner according to the recommended result; the learner refers to a personalized recommendation system, and the environment refers to a user facing the personalized recommendation system;

and the feedback tampering module acquires the feedback result of the environment, tampering the feedback result based on the repeated HCT tree and returning the feedback result to the learner.

Further, the HCT tree reproduction module is specifically configured to:

defining a logo variable、/>And node variable +.>The node variablesFor pointing to a node in the recurring HCT tree; and remembers the node selected by the t-th round learner from the HCT tree asThe arm selected is->T is a positive integer;

at the t-th wheel, putJudging the current->Whether the value of (2) is 1:

if it isConsider->Nodes and +.>Corresponding, and->Juxtaposing->Is->；

If it isJudging->Whether the value of (2) is 0:

if it isTraversing all nodes in the recurrent HCT tree to find whether the node +.>Corresponding arm->If present, consider node->And->Correspondingly, juxtapose->If the tree leaf node does not exist, expanding the repeated HCT tree leaf node.

Further, the expanding recurrent HCT leaf child node includes:

finding an included arm in the covered arm spaceLeaf node->Expanding two sub-nodes, and judging which of the two sub-nodes covers an arm space containing an arm +.>If the arm space covered by one of the child nodes contains arm +.>The child node is considered to be +.>Arms corresponding to the child node and +.>Same and let node variable->Pointing to another child node and then setting +>。

Further, the feedback tampering module is specifically configured to:

obtaining feedback results of the environment, and makingAnd update->；

If it isCalculate +.>；

wherein ,

；

If it isPut->；

The probability of successful tampering isUnder the condition of->And returns it to the learner; if the tampering fails, then->The method comprises the steps of carrying out a first treatment on the surface of the Then let->；

In a third aspect, the present invention provides an electronic device comprising:

a memory for storing a computer software program;

and the processor is used for reading and executing the computer software program so as to realize the poisoning attack method for the personalized recommendation system based on the X-armed bases.

In a fourth aspect, the present invention provides a non-transitory computer readable storage medium, where a computer software program for implementing a method of poisoning an X-armed based personalized recommendation system according to the first aspect of the present invention is stored.

The beneficial effects of the invention are as follows: with respect to the research work in the current field, whether the attack is a manipulation attack or a poisoning attack, most of the attack targets are MAB algorithm, and the attack research of the X-armed bases algorithm is blank. Meanwhile, the HCT algorithm, which is a typical algorithm of X-armed bases, is quite different from the typical MAB algorithm UCB (The Upper Confidence Bound Algorithm), because the former requires maintenance of a binary tree to discretize the huge arm space at run-time, while the latter does not. The goal of an attacker is to force the HCT algorithm to choose the node in the binary tree that contains the target arm under attack, rather than forcing the UCB algorithm to choose a particular arm as in the case of attacking UCB, resulting in the inability of existing poisoning and manipulation attacks against UCB algorithms to directly act on the HCT algorithm.

Drawings

FIG. 1 is a schematic diagram of a workflow of an Internet of things service recommendation system and an attack mode of an attacker;

FIG. 2 is a schematic diagram of a poisoning attack model;

FIG. 3 is a flowchart of a method for poisoning an X-armed bands based personalized recommendation system according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a poisoning attack device for an X-armed bases-based personalized recommendation system in a big data environment according to an embodiment of the present invention;

fig. 5 is a schematic diagram of an embodiment of an electronic device according to an embodiment of the present invention;

fig. 6 is a schematic diagram of an embodiment of a computer readable storage medium according to an embodiment of the present invention.

Detailed Description

The principles and features of the present invention are described below with reference to the drawings, the examples are illustrated for the purpose of illustrating the invention and are not to be construed as limiting the scope of the invention.

In this example, we have proposed vulnerability studies under a poisoning attack against a typical X-armed bases algorithm HCT (The High-Confidence Tree Algorithm, high confidence tree algorithm).

Taking the video big data recommendation system as an example, as shown in fig. 1, in the system, all videos are mapped to one metric space X (each video can be regarded as one arm) and discretized by a hierarchical structure, i.e., binary coverage tree. After the system receives the user request, the X-armed candidates algorithm uses the overlay tree to estimate the optimal video, pushes the recommendation to the user, and then the user submits the feedback to the algorithm. Based on the feedback, the algorithm will improve the recommendation results over the turn in an effort to recommend video content that is more relevant to the particular user's interests. Meanwhile, an attacker can force the system to recommend certain specific video content by hijacking recommendation results or feedback.

To deal with the huge arm spaceThus optimizing the yield, the HCT algorithm adopts a hierarchical structure to discretize the arm space. The hierarchical structure is a binary coverage tree +.>Wherein->The layering is +.>Is represented as (1)Then the root node is (0, 1). Node->Is +.> and />. For a pair ofIn every node->Which covers the arm space +.>Is a subset->, and />The following three conditions are satisfied:

①

②

③。

at the same time, each nodeRandomly select a representative arm->As long as the node->Is selected to be arm->Submitting to the environment.

At each round t, for each node, in order to decide which node to chooseThe HCT algorithm calculates a confidence upper bound, namely:

(1)

wherein ，/>，/> and and />Two hyper-parameters for use by the learner. Definitions->Node representing the choice of HCT algorithm at round t, -/->Indicating by round t, HCT algorithm selects node +.>Is set of turns, +.> and />. In addition, the arm space is->Defining a difference function->It satisfies->Is->The method comprises the steps of carrying out a first treatment on the surface of the For a subset of arm space->The diameter is defined as->Then for each node ∈ ->Needs to meet->。

To get a more stringent upper confidence bound, in addition to calculating the U value, the HCT algorithm also calculates a B value for each node, namely:

(2)

it can be seen that the larger the corresponding B value of the node, the greater the likelihood that it will contain the optimal arm, which is also the basis for HCT to select the node. At each round t, the HCT algorithm starts from the root node (0, 1), and selects one of the two child nodes with larger B value from each layer down until the leaf node or the node for which the following formula is established (note that the node selected by the round HCT algorithm is）：

(3)

And then the arm is connectedSubmit to the environment and get feedback, and update +.>，/> and />At the same time, the root node is updated to +.>All nodes on the path of (a). Finally, check the current node->Whether or not to hold the following formula:

(4)

if the above equation can be satisfied, the node is extendedIs a two child node of (a), i.eAnd sets the U value corresponding to the two child nodes to +.>。

The model of the poisoning attack is shown in fig. 2. In the first placeWheel, first step, learner selects arm +.>Meanwhile, the device is intercepted by an attacker; second step, the environment receives->After that, give feedback->And intercepted by an attacker; third step, the attacker by means of the pair +.>Add->Tamper it as +.>And submitted to the learner. It should be noted that, because of environmental restrictions (such as condition restrictions of the attacker or a learner adopting a certain defense means), the attacker has only a chance +.>Can be successfully tampered with->。

Based on the above, the embodiment of the present invention provides a method for poisoning and attacking a personalized recommendation system based on X-armed bases, as shown in fig. 3. Before describing this embodiment, first, the parameters and their meanings used in this embodiment will be described.

Representing arm space, i.e. the set of all arms;

representing a target arm specified by an attacker;

representing the cut to->HCT trees replicated by round-robin aggressors;

representing->A set of middle leaf nodes;

covering tree->The number of intermediate nodes;

representing the coverage tree->Middle level is->The order is->Wherein->，/>， and />For node->Each node covering arm spaceThe root node (0, 1) covers the whole arm space;

representing the node selected by the learner at round t;

representing in round t,/>Comprises a target arm->Is a leaf node of (a);

representing node->The subspace covered, i.e. +.>；

When node->When the HCT algorithm is selected, a pull arm is randomly selected from the HCT algorithm>After the selection, only select the node +.>Will be->Submitting to the environment. In addition, because the data volume is extremely huge in the big data environment and in order to give consideration to the diversity of recommended contents, the +.>Are generally different from each other;

to the endRound t, learner select node +.>Of turns of (i.e.)；

By the t-th round, the learner selects node +.>The number of times, i.e.)>；

，/>Average of the environmental raw feedback in the round, namely: />, wherein />Raw feedback representing the environment in the s-th round;

，/>the average value of environmental feedback after tampering in the round is: />, wherein />Feedback representing the environment after being tampered by an attacker in the s-th round;

additionally defined is a function with N, delta and t as arguments:

delta is a parameter in a value range of (0, 1), the function B (N, delta, t) is used for calculating a confidence interval of a group of independent random variable average values with the same distribution, and the confidence of the confidence interval is 1-delta.

Variable(s)(initial value is 0), variable->(initial value is 0) and node variable +.>。

The poisoning attack method for the personalized recommendation system based on the X-armed bases provided by the embodiment of the invention comprises the following contents:

s100, acquiring a recommendation result recommended to the environment by a learner, and reproducing an HCT tree of the learner according to the recommendation result;

for an attacker, it is known that there is only a chance of each launch of an attackTamper success, in order to guarantee the validity of the attack, there is a variable +.>The method comprises the following steps:

where M represents the round of ensuring that the attack is effective, T represents the total round of HCT algorithm operation, and the attacker can select according to the need to determineBut need to be guaranteed +.>。

Specifically, step S100 includes the following sub-steps:

s110, at the t-th wheel, putJudging the current->Whether the value of (2) is 1:

if it isStep S120 is performed;

s120, judgingWhether the value of (2) is 0:

if it isThen go through the recurrenceAll nodes in the HCT tree find out if there is a node +.>Corresponding arm->If present, consider node->And->Correspondingly, juxtapose->Step S200 is executed, if not, step S130 is executed;

if it isStep S200 is performed.

S130, searching for the covered arm space containing the armLeaf node->The method comprises the following steps:and expand two child nodes of the node +.>Is->Determining which of the two child nodes covers an arm space containing an arm +.>If the arm space covered by one of the child nodes contains arm +.>The child node is considered to be +.>Arms corresponding to the child node and +.>Same and let node variable->Pointing to another child node. For example: if->Then consider child node->And->Correspondingly (I)>And ∈node variable>Point to->On the contrary if->Then consider child node->And->Correspondingly (I)>And ∈node variable>Point to->。

Then put inAnd performs step S200.

Specifically, step S200 includes the following sub-steps:

s210, obtaining feedback results of the environment, and lettingAnd update->；

S220, ifCalculate +.>；

wherein ,

；

if it isPut->；

S240, orderAnd inherits the reproduced HCT tree, and then jumps to step S110.

For HCT trees maintained by learners, the attacker does not have direct access to their specific structure, but can obtain the learner's selected arm through each roundFeedback of the environment->This was reproduced.

Through the above steps, the attacker achieves manipulation of the HCT algorithm even though it selects the node containing the target arm as much as possible. More specifically, here, assume that the total round of HCT algorithm operation isThen the attacker can be sub-linear in the upper bound of attack consumption, i.e. +.>Under the condition of (a) implementing attack, then the HCT algorithm has at leastThe nodes of the round of selection of (a) contain target arms specified by the attacker.

As shown in fig. 4, the embodiment of the present invention further provides a poisoning attack apparatus for a personalized recommendation system based on X-armed bases, including:

the HCT tree reproduction module is used for obtaining a recommended result recommended to the environment by the learner and reproducing the HCT tree of the learner according to the recommended result;

Further, the HCT tree reproduction module is specifically configured to:

at the t-th wheel, putJudging the current->Whether the value of (2) is 1:

if it isConsider->Nodes and +.>Corresponding, and->Juxtaposing->Is->；

If it isJudging->Whether the value of (2) is 0:

Further, the expanding recurrent HCT leaf child node includes:

Further, the feedback tampering module is specifically configured to:

s210, obtaining feedback results of the environment, and lettingAnd update->；

S220, ifCalculate +.>；

wherein ,

；

If it isPut->；

Referring to fig. 5, fig. 5 is a schematic diagram of an embodiment of an electronic device according to an embodiment of the invention. As shown in fig. 5, an embodiment of the present invention provides an electronic device 500, including a memory 510, a processor 520, and a computer program 511 stored on the memory 510 and executable on the processor 520, wherein the processor 520 executes the computer program 511 to implement the following steps:

Referring to fig. 6, fig. 6 is a schematic diagram of an embodiment of a computer readable storage medium according to an embodiment of the invention. As shown in fig. 6, the present embodiment provides a computer-readable storage medium 600 having stored thereon a computer program 611, which computer program 611 when executed by a processor implements the steps of:

In the foregoing embodiments, the descriptions of the embodiments are focused on, and for those portions of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A method of poisoning a personalized recommendation system based on X-armed bases, comprising:

2. The method of claim 1, wherein the reproducing the HCT tree of the learner based on the recommendation result comprises:

s110, defining a flag variable、/>And node variable +.>The node variablesFor pointing to a node in the recurring HCT tree; and remembers the node selected by the t-th round learner from the HCT tree asThe arm selected is->T is a positive integer;

s120, at the t-th wheel, putJudging the current->Whether the value of (2) is 1:

if it isConsider->Nodes and +.>Corresponding andjuxtaposing->Is->After that, step S200 is performed;

if it isStep S130 is performed;

s130, judgingWhether the value of (2) is 0:

if it isStep S200 is performed.

3. The method of claim 2, wherein expanding the recurring HCT leaf child nodes comprises:

4. The method of claim 3, wherein the obtaining the feedback result of the environment, tampering with the feedback result based on the recurring HCT tree, and returning to the learner comprises:

s210, obtaining feedback results of the environment, and lettingAnd update->；

S220, ifCalculate +.>；

；

wherein ,

；

If it isPut->；

S240, orderAnd jumps to step S120;

5. A poisoning attack device for an X-armed bases-based personalized recommendation system, comprising:

6. The apparatus of claim 5, wherein the HCT tree replication module is specifically configured to:

defining a logo variable、/>And node variable +.>The node variable->For pointing to a node in the recurring HCT tree; and remembers the node selected by the t-th round learner from the HCT tree as +.>The arms being selected as/>T is a positive integer;

at the t-th wheel, putJudging the current->Whether the value of (2) is 1:

if it isConsider->Nodes and +.>Corresponding andjuxtaposing->Is->；

If it isJudging->Whether the value of (2) is 0:

7. The apparatus of claim 6, wherein the expanding recurring HCT leaf child node comprises:

finding an included arm in the covered arm spaceLeaf node->Expanding two sub-nodes, and judging which of the two sub-nodes covers an arm space containing an arm +.>If the arm space covered by one of the child nodes contains an armThe child node is considered to be +.>Arms corresponding to the child node and +.>Identical and let node variablesPointing to another child node and then setting +>。

8. The apparatus of claim 7, wherein the feedback tampering module is specifically configured to:

obtaining feedback results of the environment, and makingAnd update->；

If it isCalculate +.>；

；

wherein ,

；

If it isPut->；

9. An electronic device, comprising:

a memory for storing a computer software program;

a processor for reading and executing the computer software program to implement a method of poisoning an X-armed bands based personalized recommendation system according to any one of claims 1-4.

10. A non-transitory computer readable storage medium, characterized in that the storage medium has stored therein a computer software program for implementing a method of poisoning an X-armed based personalized recommendation system according to any of claims 1-4.