CN111698265A

CN111698265A - Intelligent pure bribery selfie mine excavation attack algorithm

Info

Publication number: CN111698265A
Application number: CN202010599741.1A
Authority: CN
Inventors: 王伊蕾; 苏万力; 杨国玉; 王兆杰; 刘中兴; 李凤银
Original assignee: Qufu Normal University
Current assignee: Qufu Normal University
Priority date: 2020-06-29
Filing date: 2020-06-29
Publication date: 2020-09-22

Abstract

Aiming at the problem of strategic attack in the existing block chain, the invention discloses a new selfish mining algorithm by utilizing the thought of machine learning and considering the influence on the strategic attack under the condition that a pure miner and an intelligent attacker exist: intelligentt Bribery Selfish Mining (IPBSM). The method aims to construct a bribery selfish mining model based on pure mining participation, and an attacker can reduce the calculation threshold of an attack blockchain system through machine learning, so that the incentive of the attacker to destroy the system is improved. The technical key points are as follows: the intelligent attackers select the optimal strategy through the interaction between the reinforcement learning and the external environment, standardize the interaction process between the intelligent attackers and the external environment into a Markov process, and find the optimal strategy for maximizing the self income by utilizing the reinforcement learning. Experimental results show that the IBSM algorithm has a lower computational power threshold and a higher relative yield than SM 1. The algorithm can effectively improve the success rate of private excavation attack and destroy the safety of a block chain system.

Description

Intelligent pure bribery selfie mine excavation attack algorithm

Technical Field

The invention belongs to the field of privacy protection, relates to technologies such as a block chain, selfish mining, machine learning and the like, provides an attack algorithm with a lower threshold value in a pure environment while improving the intelligence of an attacker, discovers a vulnerability of a common identification mechanism in the block chain system, and provides a new idea for further improving the security of the block chain system.

Background

In bitcoin systems, miners pack the generated transactions into a block, and promote accounting rights nodes through consensus agreements, and add the newly generated block to a distributed ledger (i.e., a blockchain) to obtain transaction fees and out-of-block rewards. Because the bitcoin has high economic value, the bitcoin attracts the attention of many people (especially some attackers). It should be noted that, even for miners with high calculation power, the capability of generating new blocks is also high, and the probability of obtaining the accounting right in the consensus protocol is also high. One extreme case is that when an attacker has the vast majority (> 51%) of the computing power, the attacker can make a 51% attack and arbitrarily change the ledger information by forking, thereby obtaining illegal profits (e.g., double blending). In a blockchain, forking is mainly divided into two cases: normal forking and malicious forking. Normal forking is caused by protocol modification or simultaneous discovery of new blocks by multiple honest miners. Malicious forking is the intentional forking by an attacker through some attack algorithms in order to gain more profit. Miners with less calculation power integrate the calculation power of the miners to form a mine pool, mine excavation is carried out according to the overall calculation power of the mine pool, and if a new block is found in the mine pool, rewards are distributed according to proportion. When the mine is developed to a certain scale, 51% of attacks are extremely easy to carry out, so that miners with little calculation power still have the opportunity to obtain greater benefits. The mine may also cause other attacks such as selfish mining attacks, stubborn attacks, and the like. These strategic attacks seriously undermine the economic and ecological environment of the digital currency system, affecting its benign development. Therefore, the security problem caused by such attacks has been a focus. One solution to this type of attack is to increase the scale of honest miners and construct consensus protocols that encourage compatibility.

Disclosure of Invention

Suppose the profit of digging a mine on a public chain is R_puThe yield of digging on the private chain is R_prWhen an attacker selects IBSM, the profit of rational miners in mining on the public chain is R_puBut the yield of digging on the private chain becomes R_pr+ negl (n) because

Therefore, for rational miners, the method deviates from a standard bitcoin protocol, selects a private chain with higher profit to work, enables blocks on the public chain to become invalid blocks, and increases the profit of attackers;

establishing a Markov model for the algorithm through Reinforcement Learning, wherein the model is defined as a quadruple

Wherein S represents a state set, A represents an action set, P represents a state probability transition matrix, and R is an incentive matrix; the following sections will describe M in detail, respectively:

(1) action set A: action set a represents the set of policies that an attacker can choose in a certain state:

(a) adopt: the attacker accepts the honest chain, discards the private chain, chooses to dig the mine on the last block of the honest chain without causing bifurcation,

(b) override: the attacker publishes the private link block, the action is more appropriate when l _ a > l _ h,

(c) match: the honest miner finds a new block to make the honest chain length equal to the private chain length, i.e. l _ a = l _ h, at this time the attacker publishes all blocks on the private chain and performs a brity attack on the current private chain to increase the probability that the private chain becomes the longest legal chain,

(d) wait: the attacker does not release a new block and continues to dig a mine on the private chain;

(2) a state set S: assuming that the state in the state set is s = < l _ a, l _ h, optional > where a represents public chain length, h represents private chain length, optional can take any value in the set (irrelevant, relevant, active),

(a) if the current state is < l _ a, l _ h, called >, the previous state is < l _ a, l _ h-1, optional >,

(b) if the current state is < l _ a, l _ h, irrelevant >, the previous state is < l _ a-1, l _ h, optional >,

(c) if the current state is < a, h, active >, the current block chain network is branched due to match operation;

(3) state transition matrix P: in the current markov model, each state is a triplet and is represented as < l _ a, l _ h, optional >, assuming that the computing power of an attacker is alpha and the computing power of honest miners is 1-alpha, the probabilities of the initial state being <1,0, irrelevant > or <0,1, relevant > are respectively alpha and 1-alpha, and the attacker selects the optimal action according to the current state so as to transfer to the next state;

(4) reward matrix R: it has been described above that the state at each time is represented as a triple < l _ a, l _ h, optional >, and the attacker chooses an action to transition to the next state at each state, and when the state transitions, the attacker obtains the corresponding reward, assuming that the reward obtained is a double-tuple < r _ la, r _ lh >, where r _ la represents the reward obtained by the attacker and r _ lh represents the reward obtained by the honest miner.

The method can construct the bribery selfishment attack into a Markov process under the condition of considering the rationality and the intelligent attacker, and optimizes an attack algorithm by utilizing the idea of machine learning. The invention achieves the following effects: in the block chain, all participants are rational and intelligent, and the participants select proper parameters by constructing a Markov process by utilizing a machine learning thought to achieve the effects of minimum threshold and maximum income. The invention is suitable for the safety of the block chain system.

Drawings

FIG. 1 IBSM Markov process;

FIG. 2 IBSM revenue;

FIG. 3 IBSM compares benefits of other algorithms.

Detailed Description

The state transition diagram for an IBSM is shown in FIG. 1. Initializing a binary search interval to be [0,1], low representing the left boundary of the interval, high representing the right boundary of the interval, and rho representing the middle value of the current search interval, namely the expected accumulated profit. Gamma denotes the communication capability of the attacker, and its initial value is 0.5. eps represents the lower bound of the search interval difference, and when the interval difference is less than or equal to eps, the current maximum expected revenue rho is found.

In the OptimalStrategyMdp function, it can be seen that all state states in the current blockchain network are initialized first. For each state, the benefit that can be gained by taking some action is calculated. As can be seen from the algorithm, if the selected action is Match, at this time, the attacker performs a bribery attach on the current network, so as to induce a rational miner to select a private link for work. And finally, solving the markov model problem by calling a function in the mdptoolbox.

The IPBSM algorithm comprises the following specific steps:

inputting: initializing two parameterslow←0，high←1;

And (3) outputting:rho;

(1) Function OptimalStrategy(α,r,maxlen,eps,low,high)

(2) rho = (low + high) / 2

(3) mdp = OptimalStrategyMdp(α,r,maxlen,rho)

(4) Ifmdp.V[0] > 0then

(5) low = rho

(6) Else

(7) high = rho

(8) EndIf

(9) Ifhigh – low<= epsthen

(10) Returnrho

(11)EndIf

(12) OptimalStrategy(α,r,maxlen,eps,low,high)

(13) EndFunction

(14) Function OptimalStrategyMdp (α,r,,maxlen,rho)

(15)State←{}

(16) Init(states)

(17) For (indx,state←states.items()do

(18) IfAction == Adoptthen

(19)p_adopt, r_adopt←ComputationAdopt(state)

(20) EndIf

(21) IfAction == Override

(22)p_override, r_override←ComputationOverride(state)

(23) EndIf

(24) IfAction == Match

(25)p_match, r_match ←BriberyComputationMatch(state)

(26) EndIf

(27) IfAction == Wait

(28)p_wait,r_wait←ComputationWait(state)

(29) EndIf

(30) EndFor

(31)P←[p_adopt, p_override, p_match, p_wait]

(32)R←[r_adopt, r_override, r_match, r_wait]

(33)mdp←mdptoolbox.mdp.PolicyIteration(P, R, Discount)

(34) Returnmdp

(35) EndFunction

validation of the invention

To verify the effectiveness of the algorithm, in fig. 2, there is an upper bound on the relative gain of the attacker when the attacker's power is 0.25 or 1/3 or 0.4. If the attacker has most calculation power, the relative gain gradually increases with the increase of the chain length until the relative gain is 1. To compare the superiority of the IBSM algorithm, FIG. 3 is compared to-optimal, SM1 and Home Ming. Honest mining not only achieved higher relative gains than SM1 when the attacker's calculated power was below 25%, but also approached that obtained from-optimal private mining. In this case, no higher relative gain is obtained whether selfish mining is performed with-optimal or SM 1. In the IBSM algorithm, when the calculation power of an attacker is greater than 0.075, higher relative benefits than honest ore excavation can be obtained. This means that as long as an attacker has an algorithm power exceeding 0.075, a higher relative gain can be obtained from privately mining using the IBSM algorithm. Therefore, the IBSM algorithm significantly lowers the computation force threshold for selfish mining.

Claims

1. Suppose the profit of digging a mine on a public chain is R_puThe yield of digging on the private chain is R_prWhen an attacker selects IBSM, the profit of rational miners in mining on the public chain is R_puBut the yield of digging on the private chain becomes R_pr+ negl (n) because

Therefore, for rational miners, the method deviates from a standard consensus protocol, selects a private chain with higher profit to work, enables blocks on the public chain to become invalid blocks, and increases the profit of attackers;

(3) state transition matrix P: in the current markov model, each state is a triplet, represented as<l_a,l_h,optional>Assuming the power of the attacker is

The honest miner has the calculation power of

Then the initial state is<1,0,irrelevant>Or<0,1,relevant>Respectively has a probability of

And

attackerSelecting the best action according to the current state, thereby transferring to the next state;