CN112232844A

CN112232844A - Block chaining coin mine pool multi-miner cooperative evolution method based on time sequence difference algorithm

Info

Publication number: CN112232844A
Application number: CN201910632888.3A
Authority: CN
Inventors: 欧嵬; 罗恩韬; 邓铭巍
Original assignee: Hunan University of Science and Engineering
Current assignee: Hunan University of Science and Engineering
Priority date: 2019-07-14
Filing date: 2019-07-14
Publication date: 2021-01-15

Abstract

The invention discloses a block chaining coin mine multi-miner cooperative evolution method based on a time sequence difference algorithm. Analyzing the game condition among miners, regarding the game among miners as an iterative prisoner-vain predicament, establishing a game model of double miners and multiple miners, using a zero determinant strategy in the game, finally predicting the income of the next round by using a time sequence difference algorithm, adopting a greedy strategy, selecting the behavior of the next round and changing the overall cooperation probability of an ore pool. And (3) iteratively executing a time sequence difference algorithm, and finally converging the overall cooperation probability of the ore pool to 1, namely achieving the full cooperation state of the ore pool and solving the problem of block interception attack.

Description

Block chaining coin mine pool multi-miner cooperative evolution method based on time sequence difference algorithm

Technical Field

The invention relates to a block chaining coin mine pool multi-miner cooperative evolution method based on a time sequence difference algorithm.

Background

A block chain (Blockchain) is used as a brand-new information storage, transmission and management mechanism, and reliable transfer of data and value is realized in a way of 'going to the center' and 'going to trust' by enabling users to participate in data calculation and storage together and mutually verifying authenticity of data. In recent years, the blockchain technology has attracted wide attention from various fields, and the search index continues to rise, which becomes one of the emerging internet technologies that are popular in recent years. Since the clever first proposed the blockchain concept in 2008, the blockchain technology architecture has grown to maturity over more than ten years. By Gartner's forecast, blockchain technology will produce commercial values of up to 1760 billion dollars in many industries, including manufacturing, by 2025. Currently, the application of the blockchain technology is typically represented by the financial field, and gradually expands and extends to a plurality of fields of economic society such as medical health, logistics, industrial internet and the like, and is generally concerned and globally explored.

The Bizhou currency is the block chain application which has the largest number of people in use, the largest system scale and the most stable transaction in the global scope at present. The workload certification mechanism (PoW) directly triggered the birth of blockchain technology. PoW describes a security accounting system that addresses the byzantine problem by introducing computational competition of distributed nodes to ensure data consistency and consensus. The competition is being made by the increasing size of the bitcoin system, the possibility of digging a mine by a single miner has become very small, and therefore, the 'Mining pool' of jointly digging mines by a plurality of miners by concentrating the calculation power of the miners is generated.

Studies have shown that miners can increase their own revenue using block trapping attacks in an open mine. From the perspective of the game theory based on rational economists, all miners will eventually choose to attack each other, but the income they receive will be less than the income when they do not attack each other. This is the miner's predicament under PoW, which can be compared with the classical prisoner's predicament in the game theory, i.e. the optimal strategy from the individual perspective but not the whole perspective, and the miner's predicament is analyzed and optimized from the game theory perspective.

The zero-determinant strategy is one of the popular directions of current game theory research. Originating from the papers published by Press and Dyson, Press and Dyson indicate in the paper that such a strategy exists in the iterative imprisoning game dilemma: the single prisoner can unilaterally control the income of the opponent, and the linear relation between the opponent and the income of the single prisoner can be forced no matter whether the opponent adopts any strategy. Pan et al, on this basis, applied the ZD strategy in the multi-party participating gambling game, demonstrated that the ZD strategy can control the sum of the profits of all players of the opponents to have a linear relationship.

Disclosure of Invention

In order to overcome the technical problem that miners increase their own income by using block interception attack in the conventional mine and finally the income of the whole mine is reduced, the invention provides a block chaining coin mine multi-miner cooperation evolution method based on a time sequence difference algorithm, which can enable the miners of the whole mine to all adopt a cooperation mode under the condition of being as fast as possible and finally improve the income of the whole mine.

In order to achieve the technical purpose, the technical scheme of the invention is that,

a block chaining coin mine pool multi-miner cooperative evolution method based on a time sequence difference algorithm comprises the following steps:

step one, determining a strategy vector of each miner in next round of operation according to the strategy condition of cooperation or attack adopted in each round of operation based on game among the miners by using N miners in an ore pool, and simultaneously obtaining the income vector of each miner;

step two, obtaining the strategy probability of each miner in the process of selecting the strategy according to the strategy vector of each miner, and obtaining a Markov state transition matrix under the condition of multi-miner game based on the strategy probability, namely a strategy selection change matrix;

performing row-column transformation on the Markov state transition matrix to obtain a determinant which is unilaterally controlled by a strategy of a single miner, and combining the income vector of the miner to obtain the expected income of the miner so as to obtain a linear combination of the expected income;

step four, according to the linear combination of expected profits, according to the probabilities of cooperation and attack, adjusting the strategy adopted by the miner to control the expected profits range of other miners, introducing a factor related to the weight of a profits vector and used for expressing the multiple between the income of the miner and the sum of the income of other miners as a fraud factor, setting the fraud factor as a dynamic change value, when the cooperation probability of the whole mine pool is small, improving the value of the fraud factor to ensure that the miner can obtain high profits, and when the cooperation probability of the whole mine pool is large, reducing the value of the fraud factor to force the mine pool to be converted to a full cooperation state; when the mine is in a fully cooperative state, the knock-out factor will evolve to be constant to continuously maintain the fully cooperative state of the mine, thereby maximizing the revenue of the entire mine.

In the block chaining coin mine pool multi-miner cooperative evolution method based on the time sequence difference algorithm, in the step one, attack is block interception attack, and cooperation is not block interception attack.

In the block chaining coin and ore pool multi-miner cooperative evolution method based on the time sequence difference algorithm, in the step one, a strategy vector p of a miner 1¹Comprises the following steps:

wherein p is_C,nThe probability that the miners 1 adopt cooperation in the previous round and select cooperation in the current round under the condition that n other miners cooperate is shown, p_A,nShowing that the miner 1 adopts attack in the previous round and the miner selects the probability of cooperation in the current round under the condition that N other miners cooperate, wherein N is the total number of the miners in the mine;

the yield vector of the miners is:

wherein x is ∈ [1, N ∈ >]，

Where n (i) denotes in game state s_iThe number of other miners collaborating;

as an indicator, to indicate at game state s_iThe behavior of the miner is judged, if the cooperation is selected by the miners

If not, then,

in the second step, the Markov state transition matrix is

Wherein M is_i,jAnd the transition probability of the ore pool from the state i to the state j is indicated, and N is the total number of miners in the ore pool.

In the third step, the linear combination of the expected income of miners is

The technical effect of the invention is that a model of two-party and multi-party games is created in a pool game environment based on a workload certification mechanism (PoW), and a zero determinant strategy is tried to be introduced, so that a new idea is provided for solving the problem of block interception attack in a bitcoin pool. Analyzing the game condition among miners, regarding the game among miners as an iterative prisoner-vain predicament, establishing a game model of double miners and multiple miners, using a zero determinant strategy in the game, finally predicting the income of the next round by using a time sequence difference algorithm, adopting a greedy strategy, selecting the behavior of the next round and changing the overall cooperation probability of an ore pool. And (3) iteratively executing a time sequence difference algorithm, and finally converging the overall cooperation probability of the ore pool to 1, namely achieving the full cooperation state of the ore pool and solving the problem of block interception attack.

Drawings

FIG.1 shows the effective range of N;

FIG. 2 is a process of cooperative probability evolution;

FIG. 3 is a comparison of the number of rounds required for the TD algorithm using the ZD strategy to converge to 1 with the cooperation probability in the custom strategy;

FIG. 4 is a revenue evolution process with an initial probability of 0.1;

FIG. 5 is a revenue evolution process with an initial probability of 0.3;

FIG. 6 is a revenue evolution process with an initial probability of 0.5;

FIG. 7 is a revenue evolution process with an initial probability of 0.7;

FIG. 8 is a revenue evolution process with a start probability of 0.9.

Detailed Description

The method starts from a double-miner game model, proves the feasibility of the ZD strategy under the double-miner game, further provides a more complete 'N-party miner game model', uses a multi-party ZD strategy to optimize the bit coin system under the multi-miner environment, and achieves the condition of full cooperation of multiple mine pools in the system through iteration, thereby increasing the block throughput of the bit coin system and simultaneously increasing the overall income and the miner income of each mine pool. In order to obtain the best strategy in the game, the overall cooperation probability of the mine pools is converged to 1 in the shortest iteration times, each mine pool is regarded as an agent (agent), the next round of profit prediction is carried out by using a Temporal Difference Learning Method (TD) in reinforcement Learning, the behavior of the next round is selected by using a greedy strategy, and the cooperation probability of the mine pools is changed at the same time.

In the bitcoin system, all nodes (i.e. miners) mutually compete based on respective computer computing power to jointly solve a SHA256 mathematical problem (i.e. mine digging) which is complex to solve but easy to verify, and the node which solves the problem the fastest obtains block accounting weight and bitcoin reward which is automatically generated by the system. The mathematical problem can be expressed as: and according to the current difficulty value, searching and solving a proper random number (Nonce) to enable the SHA256 hash value of a block header (the random number is contained in the block header) to be less than or equal to the target hash value, and controlling the average generation time of the block to be about 10min by the bitcoin system through flexibly adjusting the difficulty value of random number searching.

The mine pond consists of a mine pond manager and miners. The mine manager joins the bitcoin system with the identity of a single miner, but he does not expend the effort of finding a particular random number as other miners, but instead outsources the task of finding a random number to the miners in the mine. Each miner in a mine is assigned a certain amount of random number finding tasks called partial proof of work. The mine manager will evaluate the miners' work based on the partial proof of work submitted by each miner. Once a particular random value, called a full proof of operation, is found, the miners who found this random value submit it to the administrator who then broadcasts the random value to the entire bitcoin network. Eventually, the administrator receives the block out reward and distributes the reward to miners based on the calculated power each miner contributes.

Because most mine ponds are open, any miners are allowed to join the mine ponds. Any mine pool can perform a block trapping attack to invade other mine pools by the miners who have dispatched the mine pool. The concept of block-trapping attacks [8] is that attackers join a mine, but submit only partial workload proofs regardless of whether they find them. Because of the partial proof of the workload they submit, administrators still think they are honest miners and will distribute rewards based on their strength. Thus, the attacker actually receives the reward from the mine without contributing significant computational effort, which directly results in a reduction in the mine revenue and also in a reduction in the throughput of the entire blockchain system.

The game theory studies the problem of individual selection in the presence of an interaction relationship. All participants here are rational persons. Rational persons are those who maximize their preferences in the face of given constraints. This rational person is different from a selfish person. Rational persons may be the ones with the right of his or her own meaning or the ones with the right of his meaning. The reasoner hypothesis is the analytical premise of the game theory. Based on this premise, all participants try to maximize their preferences (benefits), and if some action is present that makes the participant better, the participant will actively strive to achieve better results. Rational people need to cooperate when maximizing the preference, and conflicts exist in the cooperation.

Prisoner's trouble gambling was first proposed by Kuh et al. In the prisoner dilemma game, two agents simultaneously decide to select cooperation (C) or traitor (D) on the basis that the two agents do not know the current information of the other party; then, the Agent selects the strategy of the next round according to a certain learning mechanism. Both parties obtain a benefit R when cooperating and obtain a benefit P when both parties are traitorous; if one party collaborates and the other party traitors, then the traitor obtains the maximum benefit T and the collaborators obtain the minimum benefit S, with the parameters satisfying the condition T > R > P > S. At present, prisoner stranded situation is well applied to biological, political and economic research, and abundant achievements are obtained.

For prisoner's dilemma, scholars strategically put out WSLS (lost shift), TFT (tit-for-tat), GTFT (genetic tip-for-tat), memoryless fully cooperative strategies and fully traitor strategies, discussing their stability of evolution in repeat gambling and their evolutionary behavior that encourages cooperation. But in many policies, none of them can unilaterally determine the adversary's benefits. Zero-determinant strategy firstly proposed by Press and Dyson in 2012 can set the income of opponents in one way and can also adopt proper strategy to ensure that the income of the opponents is multiple of the income of the opponents, thereby achieving the purpose of knocking, so the strategy is also called knocking strategy. The advantages of the method are widely concerned by numerous scholars, people intensively research the robustness of the zero-determinant strategy and the evolution stability of the zero-determinant strategy and WSLS, full-cooperation strategy, full-traitor strategy and TFT strategy in group games in a repeated prisoner trapping situation model, endow different characteristics to nodes, and discuss the influence of the nodes on the evolution and how to change parameters to promote the combined operation to emerge. In order to improve the evolution stability of the strategy, other sub-strategies such as a generous strategy are proposed for the zero determinant strategy in the literature. The advantages of the zero-determinant strategy are also extended to multi-person and continuous strategy scenarios; meanwhile, the method is also applied to the fields of public goods games, repeated noise games and the like.

In a bitcoin system, miners and pools play games, and numerous scholars give different models: the bit coin system proposed by Eyal et al is fragile in nature, and in the actual situation, a strategy called Selfish mining exists, namely, a private block is continuously mined and not published, and the private block is published when the length of the private block is larger than that of a public chain, so that the public chain loses meaning, thereby causing the loss of computing resources of honest miners, which is the common problem of block chain branching; on the basis, Kiayias A and the like utilize the loophole to dig the bitcoin into a random game model with complete information, and the length of the main chain of the block chain is controlled by controlling the release time of the dug block. Liu X et al propose a deductive game model that calculates the expected gains of miners joining different mine pools to decide which mine pool to select. Lewenberg Y et al convert the miners' choice of joining the pools to a cooperative game model, with the same pool member being considered as a coalition, and the miners increase profits by changing the joining pools, but the article does not relate to the inter-pool attack problem. Tang et al consider the excavation game model as a prisoner's predicament model, analyze the equilibrium of playing the chess in excavation predicament, and optimize the strategy selection of miners by Zero-Determinant strategy. Fan et al regard the inter-pit game as the iterative prisoner's predicament model, and make AZD (adaptive Zero determining) strategy optimize the inter-pit game, make the pit in the bit coin system can reach the full cooperation state finally, but because the bit coin system has more than two pits in reality, can't be simply regarded as both sides ' game with the bit coin system.

The reinforcement learning method is developed from related subjects such as control theory, statistics, psychology and the like, and can be traced back to the conditioned reflex experiment of Barplov at the earliest. However, until published in 1992 in research paper [30] of Watkins et al, the reinforcement learning method has not been widely studied and applied in the fields of artificial intelligence, machine learning, etc., and is considered as one of the core techniques for designing intelligent systems. The main idea is to realize the optimization of the strategy by the interaction and trial and error of the intelligent body and the environment and taking the feedback signal of the environment as the input. The realization of the strategy optimization requires correct strategy evaluation and strategy iteration technology, and how to correctly estimate the function value is a central problem of the strategy evaluation. Reinforcement learning is usually described by Markov Decision Processes (MDPs) with discrete state space and motion space, and similar to a policy evaluation method in dynamic programming, value functions of each state may be stored in a table form. At present, reinforcement learning has achieved many achievements in the aspects of theory and algorithm research, becomes an effective method for solving a sequential optimization decision problem, and has been successfully applied in the fields of intelligent robots, automatic control systems and the like.

The basic idea of timing differentiation stems from learning mechanisms and empirical studies in psychology for secondary enhancement signals. The time sequence difference learning algorithm and the convergence theory have basic positions and functions in the reinforcement learning method, are similar to a strategy evaluation method in dynamic planning, and provide an effective method and a theoretical framework for solving a value function of a stable Markov decision process with unknown model. The reinforcement learning algorithm based on Linear value function estimation can be traced back to 1988, Linear Temporal Difference Learning (LTD) and TD (λ) algorithms are proposed for the first time from Sutton, in the Sutton paper, time series Difference learning is taken as a multi-step prediction learning method based on a markov chain, and can be used for solving the problem of strategy evaluation or value function prediction in a stable markov decision process, and the algorithm description of time series Difference learning is given. In 1997, work by Tsistisiklisand VanRoy demonstrated the convergence of the linear TD (λ) algorithm (where λ ∈ [0, 1]), but in some cases had instability [38 ]; boyan in 2002 further extended Least squares differential Learning (LSTD) to LSTD (λ). In the same year, based on the LSTD algorithm, La-goudakis and Parr propose a Least square Policy Iteration algorithm (LSPI) to obtain better stability; in 2006, Geramifardet et al proposed an incremental Least Squares Temporal Difference learning (iLSTD) algorithm, and an incremental Least Squares Temporal Difference learning with qualification tracking (iLSTD with identity tracks), and demonstrated the convergence of the algorithm; in 2008, Suttonet al proposed a GTD algorithm and proved that it could converge to a least square solution under a different strategy, but the convergence rate was much slower than that of the conventional algorithm; in 2009, Sutton et al was targeted by a new minimization: projection Belman Mean Square Error (Mean Square Projected Belman Error, MSPBE) deduces two novel algorithms with milestone significance, GTD second generation (GTD2) and TDC, thereby greatly improving the convergence rate of GTD; in 2010, the method in which Scherrer proposed a time series Difference (TD) motionless point solution and a bellman (bellman) residual is actually both oblique projections to a true value function, and neither solution is an optimal solution; in 2015, Liu et al analyzed the convergence of GTD algorithms under the off-strategy by adopting a finite sample analysis method, obtained a real random gradient TD algorithm through an original-dual saddle point objective function, provided projection GTD2 and GTD-MP algorithms, respectively improved the convergence and speed, and provided a performance boundary [45] under two conditions of the strategy and the off-strategy; in 2018, Liu et al propose a GTD2-MP algorithm, and the convergence rate of the algorithm is improved. The method comprises the steps of predicting the income of miners in the next round of the miner game by using a TD (lambda) algorithm, and then selecting and executing behaviors with higher income by using a greedy strategy.

In a bitcoin mine, a process of calculating a designated nouce value (commonly called mining) assigned by a mine administrator by a miner consumes a certain amount of computing power, and the amount of computing power consumed by the miner is assumed to be c (c is more than 0). When a plurality of miners select cooperative excavation, the probability of calculating the final value of nouce is greatly increased, that is, the expected profit value of each miner is larger than that obtained when the miners singly perform excavation. Assuming that miners dig together, the expected revenue of the system is expanded by a factor of r, which is a value greater than 1. In the mine, the system distributes the mining benefits of the mine according to the calculation power of miners. For miners who launch block catch attacks, the system will distribute their profits as well as the power of the miners.

Suppose there are two miners in pool P, miner X and miner Y respectively, and the game between miners can be viewed as a Markov decision process. Two miners can independently decide whether to launch the block interception attack, and the behavior of the miners at the moment is defined as attach (A) and vice versa is defined as Cooperate (C).

Assuming that the total computing power of the ore pool is 1 and the total computing power of the miners X is t (c is more than t and less than 1), the computing power of the miners Y is 1_-t. At this time, if it is assumed that the total profit of the mine pool is 1 when two miners collaborate simultaneously, the calculation power can be simply used to represent the profit. Thus, the income tables of miners under different choices are obtained:

TABLE 1 Miner's income Table

When both the two strategies are selected to cooperate, namely the strategy combination of the two strategies is (C, C), the miners consume a certain calculation force C, but the income of the whole system is increased by r times, and the income of the miners X and Y is r (1-C) and r (1-t-C) respectively; when the strategies of the two miners are combined into (C, A), namely the miners X select cooperation and the miners Y select attack, the benefits of system distribution can be obtained without paying the calculation force C by Y, so that the system is doubleThe square profit is t²-c and t (1-c); when the combination of the strategies of the two miners is (A, C), the benefits of the miners are expressed as t (1-t) and (1-t)²-c; when both choose to attack, the effective computing power of the mine pool is 0, so the gains of both parties are 0 and 0.

Obviously, from the revenue table it can be known that: the values of t, c and r will influence the strategy selection of the miners. These values are analyzed briefly as follows:

1) when r (t-c) > t (1-t), t²The best strategy for miners when c < 0 is: when his opponents cooperate he also cooperates; when his opponent attacks he also attacks;

2) when r (t-c) < t (1-t), t²When-c > 0, the optimal strategy for miners is: his opponents attack when he collaborates; he collaborates when his opponent attacks;

3) when r (t-c) > t (1-t), t²The best strategy for miners when c < 0 is: he chooses to collaborate whether his opponent attacks or collaborates;

4) when r (t-c) < t (1-t) t²When-c > 0, the optimal strategy for miners is: whether he is attacking or collaborating with his opponent, he chooses to attack;

from the above analysis, it can be seen that the fourth case is the case faced by prisoner's predicament. This means that miners based on rational analysis will eventually attack each other, reducing the profit of the mine to 0.

Through the above section, the behavior space of the dual miner model can be obtained:

B＝[C,A]

in addition, the gambling state space and the benefit vector of the model can be summarized. Obviously, for the game state space of the dual miner model, four states can be obtained:

W＝[CC,CA,AC,AA]

while the revenue vectors for miners X and Y may be expressed as:

S_X＝[r(t-c),t²-c,t(1-t),0]^T

S_Y＝[r(1-t-c),t(1-c),(1-t)²-c,0]^T

when the computational power of two miners is the same, the profit vectors for X and Y can be represented by R, S, T, P:

the strategy probability for miner X may be expressed as:

p＝[p1,p2,p3,p4]

the strategy probability for miner Y may be expressed as:

q＝[q1,q2,q3,q4]

p and q represent the state transition probabilities of mineworker X and Y in the next state selection cooperation, respectively, where the sequence of the previous state corresponding to the subelement coincides with the sequence in W. For example, when the last game space state is (C, C), i.e. both choose to cooperate, the next state miner X chooses the probability of cooperation to be p1, and the choice attack probability to be¹-p 1; the miners Y choose the probability of cooperation as q1 and the probability of attack as 1-q 1. Wherein p, q ∈ [0,1]]。

Obtaining a Markov state transition matrix of the double-miner game according to the strategy probabilities p, q:

to make the meaning of matrix M more clear, the transition process matrix for M is given below:

the sub-elements of matrix T correspond to the sub-elements of M, which represent the state transition probabilities of the corresponding sub-elements in T.

In the Markov matrix M in the double-miner game model, M has a unique characteristic value, so that a unique M' ≡ M-I can be obtained, wherein I represents an identity matrix. Steady state vector V for M^TOr any vector proportional to M, can result in:

V^T·M＝V^T(1)

V^T·M′＝0 (2)

according to Cramer's law there are:

Adj(M′)M′＝det(M′)I＝0 (3)

where Adj (M ') represents the companion matrix of M ', and combining (2) and (3) it can be seen that each row of Adj (M ') is proportional to VT. Select the fourth row of Adj (M'), corresponding to V^TAfter line and column conversion, a V can be obtained^TAnd any quaternion vector f ═ f1, f2, f3, f4]^TDot product of (a):

note that the second column in this matrix

And third column

As can be seen by looking at the elements of the two columns,

contains only the elements consisting of p,

contains only elements consisting of q, that is,

and

each may be unilaterally determined by miner X and miner Y.

The income vectors S of the miners X and Y_XAnd S_YCarry-in (4), the expected returns of miners X and Y can be obtained:

where 1 is a 4-dimensional column vector with all elements 1.

As can be seen from (5) and (6), the expected profit of the miners is linearly dependent on the profit vector of the miners. Therefore, such a linear relationship exists in the mine

Wherein the molecule is:

denominator:

due to the fact that

And

can be determined by the miners X and Y in a unilateral way, so that the miners X and Y can both make the (7) disappear in a unilateral way. Specifically, the miners X may set

Or the miners Y can set

In this case, the determinant would be equal to 0, the linear relationship of miners X and Y with respect to expected revenue:

αE_X+βE_Y+γ＝0 (8)

in the zero-determinant strategy, if miner X sets his strategy:

that is, α in (8) is set to 0, then equation (8) becomes:

βE_Y+γ＝0 (10)

and (10) obtaining the expected income of the miners Y:

in addition, four equations for P can be obtained by (9):

p1-1＝βR+γ

p2-1＝βT+γ

p3＝βS+γ

p4＝βP+γ

from the above four equations, β and γ can be eliminated to obtain p²And p 3:

by this alternative, one can convert (11) into:

in the double minerIn the game, the profit P is 0. Analyzing equation 16, when miner X uses the equalization strategy

When the method is used, the method can control the expected income range P of the miner Y to be less than or equal to E by unilaterally adjusting the values of P1 and P4_Y≤R。

In the knockout strategy, a knockout factor χ is introduced. Miner X may set his policy vector

Wherein x is more than or equal to 1, and phi is a free parameter. Similarly, by (15), the modification of (8) can be obtained

Φ[(E_X-P1)-χ(E_Y-P1)]＝0 (16)

From (15) again, four equations can be derived:

p1＝1+Φ(1-χ)

p4＝0 (17)

since p is [0,1], the value range of phi can be obtained from the second and third formulas

The income in the double-miner game model satisfies T > R > 0 > S, and phi is more than or equal to 1, so that the value range of phi can be obtained

Under this kind of fraud strategy, the expected profit for miner X is dependent on the policy vector q for miner Y, only if the policy vector q for miner Y is [1,1,1 ═ 1]^TIn time, the miner Y cooperates comprehensively, and the two parties can obtain the maximum profit value. If the miners Y decide to collaborate omnidirectionally, the expected revenue for both parties can be expressed as:

further analysis shows that when both behaviors are Attack, i.e., the (A, A) state, both profits are 0, so that P-0 can be substituted into (16) to obtain:

E_X＝χE_Y (22)

therefore, it is easy to know that when the miner X uses the phishing strategy, the miner X can control the expected income of the miner X to be in a linear relation with the expected income of the opposite party, and can guarantee that the income of the miner X is X times of the income of the opposite party forever by adjusting the phishing factor X, so that the purpose of the phishing miner Y is achieved.

In bitcoin mines, the computing power of two miners is not sufficient. In practice, there are hundreds of miners even in the smallest mine. Therefore, it is necessary to extend the dual-mineworker model to the multi-mineworker model.

Due to the complexity of multiparty gaming, only the case of the same miner effort is discussed for the time being herein. Assuming that the total calculated power of a mine pool with N miners is N, the calculated power of each miner can be represented by 1. Thus, the assigned revenue of a miner can be represented using computational power as simply as a dual miner model. As with the dual miner model, when a miner chooses to collaborate, the computational power he consumes is represented by c (c > 0). Also, when a plurality of miners jointly excavate the mine, the profit of the entire mine pool becomes large. And in the dual miner model, the gain expansion factor is also denoted by r. But there is a problem here: as the number of miners who excavate together increases, the probability of success of the excavation also increases. That is, the value of r should become larger as the number of collaborators increases, and its growth curve should be gradually smoothed. To solve this problem, r ═ ln (k + b) is defined, where k is the number of collaborators and b is a constant.

Due to the high degree of unity of the N miner's revenue composition, it is difficult to present a chart like in the dual miner model. But by the above definition, the existence condition of miner predicament in the multi-miner game can still be found.

According to the above definition, for any miner in a mine pool with N miners, his cooperation and attack benefits can be represented by the following:

collaboration:

attack:

where n represents the number of collaborators in his opponent. Reviewing the dual miner model, it can be seen that: miners are trapped in the sense that whether the miners' opponents choose to attack or collaborate, his best strategy is to attack. In a multiple miner game, this situation can be expressed as:

solving (23), the effective range of the multi-miner predicament can be obtained:

where b and c are two constants, meaning that the range of valid values for N changes as N changes. Analysis (24) shows that the right side of the inequality is an increasing function of N, and if the maximum value of N is substituted, a certain range of N can be found. According to the assumption that n_maxN-1. A new inequality can then be obtained:

through a simple variant, one can obtain:

e^cN＞N+b(26)

next, two functions are constructed f₁(N)＝e^cNAnd f₂(N) N + b, according to the definition, 0 < c < 1, b ≧ e-1. Finally, the images of these two functions are plotted in fig.1 to find the valid range. From FIG.1, it is known that there is N_iE (0, + ∞) satisfies

And when N is>N_iWhen e is present^cN＞N+b。N>N_iIt is the valid range of N.

If N miners exist in the mine pool and cannot communicate with each other, the miners independently determine whether to initiate block interception attack or not. At the moment, the behavior space of N miners is the same as that in the game of the double miners:

B＝[C,A]

the behavior of the current round of miners is assumed to be determined only by the state of the previous round and has no relation with the previous round, so that the repeated game among miners can be regarded as a Markov chain. In the N inter-mine games, 2 games will appear in each round^NOne possible state, for example when N is 3, the mineworker' S gambling state space may be represented as S ═ CCC, CCA, CAC, CAA, ACC, ACA, AAC, AAA]. When N is too large, the game state space will have difficulty giving a specific expression, and therefore, s_iTo refer to a specific beatThe playing state is as follows:

S＝[s₁,…,s_i,…,s_2N]

for any miners in the current mine pool, he will have a policy vector:

wherein

The probability that the miners select cooperation in the current round in the previous round of game state of the game state space corresponding to the subscript is referred to. In particular, the present invention relates to a method for producing,

the final game result in the previous round is represented as s_iIn the case of (2), the miners select the probability of cooperation at the current turn. Still taking three miners as an example, where the revenue vector for miner 1 may be expressed as:

for convenience of presentation, p may be further refined^xIs described in (1). For miner 1, in a pool game state, participants can be divided into two pieces of marketing of my party and the enemy, wherein the i.e. the miner 1, and the enemy, i.e. N-1 other miners. The behavior of our party and the behavior of the enemy form a game state, and the state of the previous round can be represented in a strategy vector p under the condition of knowing how many enemy miners select cooperation in the previous round^xIn (1). For this purpose, define p_C,n(or p)_A,n) Indicating the probability of my choosing collaboration in the previous round, in the case of my Cooperate (or Attack) and n enemy miners Cooperate. Thus, the strategy vector of miner 1 can be expressed as

For example, when a miner 1 is randomly taken from a pool with three miners, his policy vector can be expressed as:

the ore pool of N miners is composed of 2^NA game state such that each miner has a possession of 2^NA revenue vector for each sub-element. Assume the miners' revenue vector is:

wherein x is ∈ [1, N ∈ >]. Considering the definition in the previous section, one can obtain

Expression (c):

where n (i) represents the number of cooperations of an enemy miner in game state si;

as an indicator, the state s of playing chess_iThe behavior of the following party, if the party selects Cooperate, then

On the contrary, the method can be used for carrying out the following steps,

also, given the revenue vector for miner 1 in a mine with three miners:

here, a fixed value r is used for convenience of calculation instead of r ═ ln (k + b) in the definition. This change does not affect the derivation.

Next, a Markov state transition matrix for multi-miner gaming is defined:

wherein M is_i,jRefers to the transition probability of the mine transitioning from state i to state j. According to the definition of Markov state matrix, M_i,jCan be calculated using the following equation:

wherein x represents all miners, further refined:

where n (i) represents the number of collaborators in the opponent in state i;

refers to the action of my party at state j. If my party chooses to collaborate, then

If not, then,

by the above definition, the Markov state transition matrix for a mine pool with three miners is given later.

If M is a conventional state transition matrix, then it must have a unique steady-state vector V as does the Markov matrix for the double-miner game^THere, a steady-state vector V with unique characteristic values is taken^TTo obtain

V^T·M＝V^T(27)

Now, redefine:

M′＝M-I (28)

namely:

wherein delta_i,jIs Kronecker delta, and the specific expression is as follows:

then to M_i,jMaking some basic row-column transformations [6]The probability vectors can be separated to obtain a determinant that can be unilaterally controlled by the strategy of any miner. Define the mineworker x policy after separation as

Similar to the double miner game, applying the Cramer rule to M' can result in

Adj(M′)M′＝det(M′)I＝0 (30)

Meanwhile, combining (27) and (28) can result in:

V^T·M′＝0 (31)

comparing (30) and (31), it is apparent that each row of Adj (M ') (i.e., the adjoint of M') is associated with a steady-state vector V^TAnd (4) in proportion. Thus, for an N-dimensional benefit vector u^x:

Wherein m'_i,jIs denoted by M'_j,iA sub-element of (1). Using a minimum of MU in the next row replacement (32)^xIt is possible to obtain:

V^T·u^x＝det(p¹,…,p^x,…,p^N,u^x) (33)

wherein det (p)¹,…,p^x,…,p^N,u^x) Is a2^N×2^NDeterminant, for ease of understanding, the V of the three miner game is given in FIG. 3^T·u。

By using (33), V can be adjusted^T·u^xLast column u of^xCalculation of expected profit E for any miners by Laplace expansion^xThus, there are three miners 'game Markov state transition matrix and V under three miners' game^T·u：

Where 1 is a vector with all elements 1, similar to that in the double miner game.

Analysis (34) shows that the expected income of the miners is linearly related to the income vectors of the miners, and similar to a double-miner game, a linear combination of the expected income of all the miners can be obtained and is represented by the following equation:

from the previous analysis, the presence of matrix M' is known

Can be determined unilaterally by a miner. FalseThe miner is set as the miner 1, and the miner 1 is set

Then, (35) returns to 0:

strategy vector of miner 1 at the moment

The method is a zero determinant strategy under a multi-miner game.

In the equalization strategy, the goal of miner x is to unilaterally decide the sum of the expected benefits of the enemies using a certain strategy. Analysis (36) when α is₁When the sum is equal to 0, the expected income sum of the enemy miners is obtained

Then alpha is converted into_xSet to a fixed value u, u can be taken out of the summator. Any miner 1 in the multi-miner game may set the equalization policy at this time:

under this strategy, (36) becomes:

the (38) may also be modified to obtain a sum of expected returns from the enemy

That is, miners can control the expected sum of profits of enemy miners as long as the values of beta and u are set.

Let it return to the mineBalance strategy of worker x

Originally is one 2^NThe column vector, through the linear relationship of (39), also yields 2^NEquation, 2^NAn equation again corresponds to 2^NThe elements in a column vector, i.e. p_C,nAnd p_A,n,

Wherein N is an element of [0, N-1 ]]. These 2 are given below^NThe expression of this equation:

according to (40), the probability parameter p_C,nAnd p_A,nCan be determined by two parameters β and u. The reverse is also true if two probability parameters are determined, so that the values of β and u can be determined, then 2 else^NA probability parameter can also be determined by (40). Here, two more important parameters are selected, p_C,N-1,p_A,0I.e., the probability of full cooperation and the probability of full attack. The expressions for these two probabilities are given below:

p_C,N-1＝1+u(N-1)(r-c)+β

p_A,0＝β (41)

from the above two equations, expressions of the parameters u and β can be obtained respectively

β＝p_A,0 (42)

By substituting (42) into (39), a probability parameter p for the mineworker 1 can be obtained_CN-1 and p_A,0Expected income sum expression of enemy miners

As can be seen from the above formula, once the miners 1 use the equilibrium strategy, the expected income sum of the enemy miners can be determined by the total number N of miners in the mine pool, the income coefficient r, the mining consumption calculation power c and two probability parameters p_C,N-1And p_A,0And (6) determining. Since the three parameters N, r and c are fixed values, and p_C,N-1And p_A,0Is determined by the miner 1 on a single side, so that the miner 1 can set different p_C,N-1And p_A,0To control the expected total gain of the enemy.

In the fraud strategy, the goal of miner 1 is to want to use a strategy as in the double miner game so that my revenue is χ times that of the enemy revenue. Similarly to the strategy in the double miner game, the miners 1 are set to execute the following strategy:

wherein P refers to the income when all miners in the mine are in attack state, and the income vector expression of the miners is reviewed

It can be seen that in the full attack state, the profit for each miner is 0, so (44) can be written as:

the policy vector represented by (45) can obtain the relational expression between the expected income of the miners 1 of the same party and the expected total income of the miners of the enemy party

Can be obtained by simple deformation

It can be seen from (47) that when the miners use the fraud strategy, the miners can control the expected profit of our party to be always times as much as the expected total profit of the enemy party by unilaterally setting the value of the fraud factor χ.

Analysis (47) shows that the income of the party reaches the highest value when miners in the mine pool are in a full cooperation state. The mine pool can reach a full cooperation state through the combination of a time sequence difference algorithm and a zero-determinant knocking strategy.

According to (47), the phishing factor χ is defined as: the income of the our party is multiplied by the sum of the income of the miner of the enemy. If the fixed value chi is adopted, the higher income of the user can be ensured, but the situation is not favorable for the mine pool to reach the full cooperation state. Based on this consideration, χ is set to be the dynamic casualty factor:

wherein P is_CRepresenting the probability of cooperation for the entire mine. When the cooperation probability of the whole ore pool is small, the value of chi is improved, and it is guaranteed that the high income can be obtained by the user; when the cooperation probability of the whole ore pool is larger, the value of the x is reduced, and the ore pool is forced to be converted to a full-cooperation state; when the mine is in full cooperation, i.e. P_CAt convergence of 1, the disqualification factor χ will evolve to a constant, continuously maintaining the full cooperation state of the mine.

In practice, the number of miners in a mine is large, the state space number and the expected income complexity of the mine also increase linearly with the number of miners, and the definite expression of an expected income function is extremely difficult to be given, so that the mine in a full cooperation state is directly used for simulating the income of the mine, namely p^x＝1 (x∈[2,N]) Simulating the cooperation probability P of the whole ore pool by the state of the miner_CThe following gives the expression of the return function of my and enemy:

in the t-th round, with E_adp(t) represents the earnings of my party, E_adp(t) represents the sum of the gains of the enemies, and simultaneously, the cooperative gains and the attack gains of the enemies are respectively expressed by E_coo(t) and E_att(t) represents. Thus, in the t +1 round, the expected profit formula for my party can be expressed as:

E_coo(t+1)＝V_C(t+1)+E¹

wherein the probability of cooperation P_CAnd probability of attack^P _AAre each x ∈ [0,1]]. The next round of policy selection decides the policy by comparing the magnitude of the gains of attack and cooperation:

1) if E is_coo(t+1)＞E_att(t +1), the miner 1 selects a cooperation strategy. At the same time, the overall cooperation probability P of the next round of mine pool_C(t+1)＝P_C(t)+F(P_C(t +1)), probability P of the overall selection attack of the mine pool_A(t+1)＝P_A(t)-F(P_A(t+1)).

2) If E is_coo(t+1)＜E_att(t +1), the miner 1 selects an attack strategy. At the same time, the overall cooperation probability P of the next round of mine pool_C(t+1)＝P_C(t)-F(P_C(t +1)), probability P of the overall selection attack of the mine pool_A(t+1)＝P_A(t)+F(P_A(t+1))。

3) If E is_coo(t+1)＝E_att(t +1), the miner 1 selects a cooperation strategy. At the same time, the overall cooperation probability P of the ore pool of the next round_C(t+1)＝P_C(t)+F(P_C(t +1)), the overall probability of attack for the mine remains unchanged.

Where the fermi function F (epsilon) can be expressed as:

the disqualification factor χ follows P in the iterative process of the algorithm_CWill vary, which will affect the revenue function. In the long term, based on rational economics assumptions, enemy miners will eventually realize that cooperation is his best strategy, so that the number of rounds after that selects cooperation. When the iteration times are enough, the overall cooperation probability of the mine pool is converged to 1, namely, a full cooperation state is reached.

In order to test the effectiveness of the application, a cooperative probability evolution process in a three-mine pool under a scale-free network is simulated. In addition, a comparison experiment is set, and the number of iteration rounds required for reaching the full cooperation state is tested by using different initial cooperation probabilities. The data of the first 40 game rounds are intercepted, attack and cooperation benefits of each round are predicted, and cooperation probability is changed according to the benefits of each round. Then, the number of cooperative probability convergence rounds is compared with the adaptive strategy.

The first 20 rounds of cooperative probability evolution are shown. As shown in fig. 2, as the number of round iterations increases, the overall cooperation probability of the mine tends to increase. Meanwhile, the larger the initial cooperation probability, the fewer the number of iteration rounds required for convergence. After 6 rounds, the total probability of cooperation converged to 1.

The strategy is shown in fig. 3 in comparison with the number of rounds required for convergence of the adaptive strategy. It can be seen that the number of rounds required for the strategy to converge is 1 to 3 rounds faster than the adaptive strategy at the selected 4 initial cooperation probabilities.

The evolution of the miners' revenue at different initial cooperation probabilities is shown in fig. 4-8. As shown, the cooperative gain is always higher than the attack gain, so any rational miners will always choose to cooperate. Note that if the data accuracy is improved, the cooperation yield does not converge until the cooperation probability converges to 1. In addition, the attack gains have different convergence values under different initial cooperation probabilities. Fig. 6 is a benefit evolution process with an initial cooperation probability of 0.1, and the attack benefit has a tendency to converge towards the middle, and finally converge in the 139 th round. Fig. 7 is a revenue evolution flow for which the cooperation probability is 0. Likewise, the attack gains had a tendency to converge towards the middle and finally converge at round 35. The convergence probability of fig. 8 is 0.5, and convergence is completed in round 11. Note that the smaller the initial cooperation probability, the larger the number of convergence rounds required for the attack revenue. In addition, as the probability of cooperation converges, the overall pool revenue increases and the throughput of the bitcoin system increases.

The invention creates models of two-party and multi-party games in a pool game environment based on a workload certification mechanism (PoW), and tries to provide a new idea for solving the problem of block interception attack in a bitcoin pool by introducing a zero determinant strategy. Analyzing game conditions among miners, regarding the game among the miners as an iterative prisoner dilemma, establishing a game model of double miners and multiple miners, using a zero-determinant strategy in the game, predicting the income of the next round by using a time sequence difference algorithm, adopting a greedy strategy, selecting the behavior of the next round and changing the overall cooperation probability of the ore pool. And (3) performing iterative execution of the time sequence difference algorithm, and finally converging the overall cooperation probability of the ore pool to 1, namely achieving the full cooperation state of the ore pool and solving the problem of block interception attack.

In addition, the game among the mines can also be regarded as a multi-party game model. The mine pool can submerge into other mine pools by utilizing the miners of the mine pool, and block interception attack is initiated on the other mine pools. The revenue captured by the submerged miners in the other mine ponds serves as the additional revenue for themselves. In general, the effective mining power of the attacked mine pool is unchanged, but the total income is distributed among more miners (including the original miners and the miners submerged by other pools), so that the income of all the miners is reduced. Attackers may gain additional gains because some miners perform block trapping attacks, resulting in reduced computing power, but submerging in other mine pools. Subsequent work analyzes the inter-mine-pond game intruding into other mine ponds by the aid of the distributed computing power, establishes a multi-mine-pond game model, tries to optimize the model by means of a multi-party game strategy and a reinforcement learning algorithm, and improves cooperation probability among the mine ponds.

Claims

1. A block chaining coin mine pool multi-miner cooperative evolution method based on a time sequence difference algorithm is characterized by comprising the following steps:

step one, determining a strategy vector of each miner in next round of operation according to the strategy condition of cooperation or attack adopted in each round of operation based on game among the miners by using N miners in an ore pool, and simultaneously obtaining a profit vector of each miner;

step two, obtaining the strategy probability of each miner when the strategy is selected according to the strategy vector of each miner, and obtaining a Markov state transition matrix under the multi-miner game condition based on the strategy probability, namely a strategy selection change matrix;

step four, according to the linear combination of expected profits, according to the probabilities of cooperation and attack, adjusting the strategy adopted by the miners to control the expected profits range of other miners, introducing a factor related to the profits vector weight and used for expressing the multiple between the profits of the miners and the total profits of other miners as a fraud factor, setting the fraud factor as a dynamic change value, when the cooperation probability of the whole mine pool is lower, improving the value of the fraud factor to ensure that the miners can obtain high profits, and when the cooperation probability of the whole mine pool is higher, reducing the value of the fraud factor to force the mine pool to be converted to a full cooperation state; when the mine is in a fully cooperative state, the knock-out factor will evolve to be constant to continuously maintain the fully cooperative state of the mine, thereby maximizing the revenue of the entire mine.

2. The method for the cooperative evolution of the multi-miner in the blockchain coin mine pool based on the time sequence difference algorithm as claimed in claim 1, wherein in the step one, the attack is the block interception attack, and the cooperation is the attack without the block interception.

3. The method for the cooperative evolution of the multi-miner in the blockchain coin mine pool based on the time sequence difference algorithm as claimed in claim 1, wherein in the step one, the strategy vector p of the miner 1 is¹Comprises the following steps:

wherein p is_C,nThe probability that the miners 1 adopt cooperation in the previous round and select cooperation in the current round under the condition that n other miners cooperate is shown, p_A,nShowing that the miner 1 adopts the attack in the previous round and the miner selects the probability of cooperation in the current round under the condition that N other miners cooperate, wherein N is the total number of the miners in the mine;

the yield vector of the miners is:

wherein x is ∈ [1, N ∈ >]，

Where n (i) denotes in game state s_iThe number of other miners collaborating;

If not, then,

4. the block chaining coin and ore pool multi-miner cooperative evolution method based on time sequence difference algorithm as claimed in claim 1, wherein in the second step, Markov state transition matrix is

5. The method for cooperative evolution of blockchain coin mine ponds and miners based on time sequence difference algorithm as claimed in claim 1, wherein in the third step, the linear combination of the expected earnings of miners is