CN112232844A - Block chaining coin mine pool multi-miner cooperative evolution method based on time sequence difference algorithm - Google Patents

Block chaining coin mine pool multi-miner cooperative evolution method based on time sequence difference algorithm Download PDF

Info

Publication number
CN112232844A
CN112232844A CN201910632888.3A CN201910632888A CN112232844A CN 112232844 A CN112232844 A CN 112232844A CN 201910632888 A CN201910632888 A CN 201910632888A CN 112232844 A CN112232844 A CN 112232844A
Authority
CN
China
Prior art keywords
miners
miner
strategy
cooperation
mine
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910632888.3A
Other languages
Chinese (zh)
Inventor
欧嵬
罗恩韬
邓铭巍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University of Science and Engineering
Original Assignee
Hunan University of Science and Engineering
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University of Science and Engineering filed Critical Hunan University of Science and Engineering
Priority to CN201910632888.3A priority Critical patent/CN112232844A/en
Publication of CN112232844A publication Critical patent/CN112232844A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Marketing (AREA)
  • Human Resources & Organizations (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Data Mining & Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a block chaining coin mine multi-miner cooperative evolution method based on a time sequence difference algorithm. Analyzing the game condition among miners, regarding the game among miners as an iterative prisoner-vain predicament, establishing a game model of double miners and multiple miners, using a zero determinant strategy in the game, finally predicting the income of the next round by using a time sequence difference algorithm, adopting a greedy strategy, selecting the behavior of the next round and changing the overall cooperation probability of an ore pool. And (3) iteratively executing a time sequence difference algorithm, and finally converging the overall cooperation probability of the ore pool to 1, namely achieving the full cooperation state of the ore pool and solving the problem of block interception attack.

Description

Block chaining coin mine pool multi-miner cooperative evolution method based on time sequence difference algorithm
Technical Field
The invention relates to a block chaining coin mine pool multi-miner cooperative evolution method based on a time sequence difference algorithm.
Background
A block chain (Blockchain) is used as a brand-new information storage, transmission and management mechanism, and reliable transfer of data and value is realized in a way of 'going to the center' and 'going to trust' by enabling users to participate in data calculation and storage together and mutually verifying authenticity of data. In recent years, the blockchain technology has attracted wide attention from various fields, and the search index continues to rise, which becomes one of the emerging internet technologies that are popular in recent years. Since the clever first proposed the blockchain concept in 2008, the blockchain technology architecture has grown to maturity over more than ten years. By Gartner's forecast, blockchain technology will produce commercial values of up to 1760 billion dollars in many industries, including manufacturing, by 2025. Currently, the application of the blockchain technology is typically represented by the financial field, and gradually expands and extends to a plurality of fields of economic society such as medical health, logistics, industrial internet and the like, and is generally concerned and globally explored.
The Bizhou currency is the block chain application which has the largest number of people in use, the largest system scale and the most stable transaction in the global scope at present. The workload certification mechanism (PoW) directly triggered the birth of blockchain technology. PoW describes a security accounting system that addresses the byzantine problem by introducing computational competition of distributed nodes to ensure data consistency and consensus. The competition is being made by the increasing size of the bitcoin system, the possibility of digging a mine by a single miner has become very small, and therefore, the 'Mining pool' of jointly digging mines by a plurality of miners by concentrating the calculation power of the miners is generated.
Studies have shown that miners can increase their own revenue using block trapping attacks in an open mine. From the perspective of the game theory based on rational economists, all miners will eventually choose to attack each other, but the income they receive will be less than the income when they do not attack each other. This is the miner's predicament under PoW, which can be compared with the classical prisoner's predicament in the game theory, i.e. the optimal strategy from the individual perspective but not the whole perspective, and the miner's predicament is analyzed and optimized from the game theory perspective.
The zero-determinant strategy is one of the popular directions of current game theory research. Originating from the papers published by Press and Dyson, Press and Dyson indicate in the paper that such a strategy exists in the iterative imprisoning game dilemma: the single prisoner can unilaterally control the income of the opponent, and the linear relation between the opponent and the income of the single prisoner can be forced no matter whether the opponent adopts any strategy. Pan et al, on this basis, applied the ZD strategy in the multi-party participating gambling game, demonstrated that the ZD strategy can control the sum of the profits of all players of the opponents to have a linear relationship.
Disclosure of Invention
In order to overcome the technical problem that miners increase their own income by using block interception attack in the conventional mine and finally the income of the whole mine is reduced, the invention provides a block chaining coin mine multi-miner cooperation evolution method based on a time sequence difference algorithm, which can enable the miners of the whole mine to all adopt a cooperation mode under the condition of being as fast as possible and finally improve the income of the whole mine.
In order to achieve the technical purpose, the technical scheme of the invention is that,
a block chaining coin mine pool multi-miner cooperative evolution method based on a time sequence difference algorithm comprises the following steps:
step one, determining a strategy vector of each miner in next round of operation according to the strategy condition of cooperation or attack adopted in each round of operation based on game among the miners by using N miners in an ore pool, and simultaneously obtaining the income vector of each miner;
step two, obtaining the strategy probability of each miner in the process of selecting the strategy according to the strategy vector of each miner, and obtaining a Markov state transition matrix under the condition of multi-miner game based on the strategy probability, namely a strategy selection change matrix;
performing row-column transformation on the Markov state transition matrix to obtain a determinant which is unilaterally controlled by a strategy of a single miner, and combining the income vector of the miner to obtain the expected income of the miner so as to obtain a linear combination of the expected income;
step four, according to the linear combination of expected profits, according to the probabilities of cooperation and attack, adjusting the strategy adopted by the miner to control the expected profits range of other miners, introducing a factor related to the weight of a profits vector and used for expressing the multiple between the income of the miner and the sum of the income of other miners as a fraud factor, setting the fraud factor as a dynamic change value, when the cooperation probability of the whole mine pool is small, improving the value of the fraud factor to ensure that the miner can obtain high profits, and when the cooperation probability of the whole mine pool is large, reducing the value of the fraud factor to force the mine pool to be converted to a full cooperation state; when the mine is in a fully cooperative state, the knock-out factor will evolve to be constant to continuously maintain the fully cooperative state of the mine, thereby maximizing the revenue of the entire mine.
In the block chaining coin mine pool multi-miner cooperative evolution method based on the time sequence difference algorithm, in the step one, attack is block interception attack, and cooperation is not block interception attack.
In the block chaining coin and ore pool multi-miner cooperative evolution method based on the time sequence difference algorithm, in the step one, a strategy vector p of a miner 11Comprises the following steps:
Figure BDA0002129308730000031
wherein p isC,nThe probability that the miners 1 adopt cooperation in the previous round and select cooperation in the current round under the condition that n other miners cooperate is shown, pA,nShowing that the miner 1 adopts attack in the previous round and the miner selects the probability of cooperation in the current round under the condition that N other miners cooperate, wherein N is the total number of the miners in the mine;
the yield vector of the miners is:
Figure BDA0002129308730000041
wherein x is ∈ [1, N ∈ >],
Figure BDA0002129308730000042
Where n (i) denotes in game state siThe number of other miners collaborating;
Figure BDA0002129308730000043
as an indicator, to indicate at game state siThe behavior of the miner is judged, if the cooperation is selected by the miners
Figure BDA0002129308730000044
If not, then,
Figure BDA0002129308730000045
in the second step, the Markov state transition matrix is
Figure BDA0002129308730000046
Wherein M isi,jAnd the transition probability of the ore pool from the state i to the state j is indicated, and N is the total number of miners in the ore pool.
In the third step, the linear combination of the expected income of miners is
Figure BDA0002129308730000047
The technical effect of the invention is that a model of two-party and multi-party games is created in a pool game environment based on a workload certification mechanism (PoW), and a zero determinant strategy is tried to be introduced, so that a new idea is provided for solving the problem of block interception attack in a bitcoin pool. Analyzing the game condition among miners, regarding the game among miners as an iterative prisoner-vain predicament, establishing a game model of double miners and multiple miners, using a zero determinant strategy in the game, finally predicting the income of the next round by using a time sequence difference algorithm, adopting a greedy strategy, selecting the behavior of the next round and changing the overall cooperation probability of an ore pool. And (3) iteratively executing a time sequence difference algorithm, and finally converging the overall cooperation probability of the ore pool to 1, namely achieving the full cooperation state of the ore pool and solving the problem of block interception attack.
Drawings
FIG.1 shows the effective range of N;
FIG. 2 is a process of cooperative probability evolution;
FIG. 3 is a comparison of the number of rounds required for the TD algorithm using the ZD strategy to converge to 1 with the cooperation probability in the custom strategy;
FIG. 4 is a revenue evolution process with an initial probability of 0.1;
FIG. 5 is a revenue evolution process with an initial probability of 0.3;
FIG. 6 is a revenue evolution process with an initial probability of 0.5;
FIG. 7 is a revenue evolution process with an initial probability of 0.7;
FIG. 8 is a revenue evolution process with a start probability of 0.9.
Detailed Description
The method starts from a double-miner game model, proves the feasibility of the ZD strategy under the double-miner game, further provides a more complete 'N-party miner game model', uses a multi-party ZD strategy to optimize the bit coin system under the multi-miner environment, and achieves the condition of full cooperation of multiple mine pools in the system through iteration, thereby increasing the block throughput of the bit coin system and simultaneously increasing the overall income and the miner income of each mine pool. In order to obtain the best strategy in the game, the overall cooperation probability of the mine pools is converged to 1 in the shortest iteration times, each mine pool is regarded as an agent (agent), the next round of profit prediction is carried out by using a Temporal Difference Learning Method (TD) in reinforcement Learning, the behavior of the next round is selected by using a greedy strategy, and the cooperation probability of the mine pools is changed at the same time.
In the bitcoin system, all nodes (i.e. miners) mutually compete based on respective computer computing power to jointly solve a SHA256 mathematical problem (i.e. mine digging) which is complex to solve but easy to verify, and the node which solves the problem the fastest obtains block accounting weight and bitcoin reward which is automatically generated by the system. The mathematical problem can be expressed as: and according to the current difficulty value, searching and solving a proper random number (Nonce) to enable the SHA256 hash value of a block header (the random number is contained in the block header) to be less than or equal to the target hash value, and controlling the average generation time of the block to be about 10min by the bitcoin system through flexibly adjusting the difficulty value of random number searching.
In the bitcoin system, all nodes (i.e. miners) mutually compete based on respective computer computing power to jointly solve a SHA256 mathematical problem (i.e. mine digging) which is complex to solve but easy to verify, and the node which solves the problem the fastest obtains block accounting weight and bitcoin reward which is automatically generated by the system. The mathematical problem can be expressed as: and according to the current difficulty value, searching and solving a proper random number (Nonce) to enable the SHA256 hash value of a block header (the random number is contained in the block header) to be less than or equal to the target hash value, and controlling the average generation time of the block to be about 10min by the bitcoin system through flexibly adjusting the difficulty value of random number searching.
The mine pond consists of a mine pond manager and miners. The mine manager joins the bitcoin system with the identity of a single miner, but he does not expend the effort of finding a particular random number as other miners, but instead outsources the task of finding a random number to the miners in the mine. Each miner in a mine is assigned a certain amount of random number finding tasks called partial proof of work. The mine manager will evaluate the miners' work based on the partial proof of work submitted by each miner. Once a particular random value, called a full proof of operation, is found, the miners who found this random value submit it to the administrator who then broadcasts the random value to the entire bitcoin network. Eventually, the administrator receives the block out reward and distributes the reward to miners based on the calculated power each miner contributes.
Because most mine ponds are open, any miners are allowed to join the mine ponds. Any mine pool can perform a block trapping attack to invade other mine pools by the miners who have dispatched the mine pool. The concept of block-trapping attacks [8] is that attackers join a mine, but submit only partial workload proofs regardless of whether they find them. Because of the partial proof of the workload they submit, administrators still think they are honest miners and will distribute rewards based on their strength. Thus, the attacker actually receives the reward from the mine without contributing significant computational effort, which directly results in a reduction in the mine revenue and also in a reduction in the throughput of the entire blockchain system.
The game theory studies the problem of individual selection in the presence of an interaction relationship. All participants here are rational persons. Rational persons are those who maximize their preferences in the face of given constraints. This rational person is different from a selfish person. Rational persons may be the ones with the right of his or her own meaning or the ones with the right of his meaning. The reasoner hypothesis is the analytical premise of the game theory. Based on this premise, all participants try to maximize their preferences (benefits), and if some action is present that makes the participant better, the participant will actively strive to achieve better results. Rational people need to cooperate when maximizing the preference, and conflicts exist in the cooperation.
Prisoner's trouble gambling was first proposed by Kuh et al. In the prisoner dilemma game, two agents simultaneously decide to select cooperation (C) or traitor (D) on the basis that the two agents do not know the current information of the other party; then, the Agent selects the strategy of the next round according to a certain learning mechanism. Both parties obtain a benefit R when cooperating and obtain a benefit P when both parties are traitorous; if one party collaborates and the other party traitors, then the traitor obtains the maximum benefit T and the collaborators obtain the minimum benefit S, with the parameters satisfying the condition T > R > P > S. At present, prisoner stranded situation is well applied to biological, political and economic research, and abundant achievements are obtained.
For prisoner's dilemma, scholars strategically put out WSLS (lost shift), TFT (tit-for-tat), GTFT (genetic tip-for-tat), memoryless fully cooperative strategies and fully traitor strategies, discussing their stability of evolution in repeat gambling and their evolutionary behavior that encourages cooperation. But in many policies, none of them can unilaterally determine the adversary's benefits. Zero-determinant strategy firstly proposed by Press and Dyson in 2012 can set the income of opponents in one way and can also adopt proper strategy to ensure that the income of the opponents is multiple of the income of the opponents, thereby achieving the purpose of knocking, so the strategy is also called knocking strategy. The advantages of the method are widely concerned by numerous scholars, people intensively research the robustness of the zero-determinant strategy and the evolution stability of the zero-determinant strategy and WSLS, full-cooperation strategy, full-traitor strategy and TFT strategy in group games in a repeated prisoner trapping situation model, endow different characteristics to nodes, and discuss the influence of the nodes on the evolution and how to change parameters to promote the combined operation to emerge. In order to improve the evolution stability of the strategy, other sub-strategies such as a generous strategy are proposed for the zero determinant strategy in the literature. The advantages of the zero-determinant strategy are also extended to multi-person and continuous strategy scenarios; meanwhile, the method is also applied to the fields of public goods games, repeated noise games and the like.
In a bitcoin system, miners and pools play games, and numerous scholars give different models: the bit coin system proposed by Eyal et al is fragile in nature, and in the actual situation, a strategy called Selfish mining exists, namely, a private block is continuously mined and not published, and the private block is published when the length of the private block is larger than that of a public chain, so that the public chain loses meaning, thereby causing the loss of computing resources of honest miners, which is the common problem of block chain branching; on the basis, Kiayias A and the like utilize the loophole to dig the bitcoin into a random game model with complete information, and the length of the main chain of the block chain is controlled by controlling the release time of the dug block. Liu X et al propose a deductive game model that calculates the expected gains of miners joining different mine pools to decide which mine pool to select. Lewenberg Y et al convert the miners' choice of joining the pools to a cooperative game model, with the same pool member being considered as a coalition, and the miners increase profits by changing the joining pools, but the article does not relate to the inter-pool attack problem. Tang et al consider the excavation game model as a prisoner's predicament model, analyze the equilibrium of playing the chess in excavation predicament, and optimize the strategy selection of miners by Zero-Determinant strategy. Fan et al regard the inter-pit game as the iterative prisoner's predicament model, and make AZD (adaptive Zero determining) strategy optimize the inter-pit game, make the pit in the bit coin system can reach the full cooperation state finally, but because the bit coin system has more than two pits in reality, can't be simply regarded as both sides ' game with the bit coin system.
The reinforcement learning method is developed from related subjects such as control theory, statistics, psychology and the like, and can be traced back to the conditioned reflex experiment of Barplov at the earliest. However, until published in 1992 in research paper [30] of Watkins et al, the reinforcement learning method has not been widely studied and applied in the fields of artificial intelligence, machine learning, etc., and is considered as one of the core techniques for designing intelligent systems. The main idea is to realize the optimization of the strategy by the interaction and trial and error of the intelligent body and the environment and taking the feedback signal of the environment as the input. The realization of the strategy optimization requires correct strategy evaluation and strategy iteration technology, and how to correctly estimate the function value is a central problem of the strategy evaluation. Reinforcement learning is usually described by Markov Decision Processes (MDPs) with discrete state space and motion space, and similar to a policy evaluation method in dynamic programming, value functions of each state may be stored in a table form. At present, reinforcement learning has achieved many achievements in the aspects of theory and algorithm research, becomes an effective method for solving a sequential optimization decision problem, and has been successfully applied in the fields of intelligent robots, automatic control systems and the like.
The basic idea of timing differentiation stems from learning mechanisms and empirical studies in psychology for secondary enhancement signals. The time sequence difference learning algorithm and the convergence theory have basic positions and functions in the reinforcement learning method, are similar to a strategy evaluation method in dynamic planning, and provide an effective method and a theoretical framework for solving a value function of a stable Markov decision process with unknown model. The reinforcement learning algorithm based on Linear value function estimation can be traced back to 1988, Linear Temporal Difference Learning (LTD) and TD (λ) algorithms are proposed for the first time from Sutton, in the Sutton paper, time series Difference learning is taken as a multi-step prediction learning method based on a markov chain, and can be used for solving the problem of strategy evaluation or value function prediction in a stable markov decision process, and the algorithm description of time series Difference learning is given. In 1997, work by Tsistisiklisand VanRoy demonstrated the convergence of the linear TD (λ) algorithm (where λ ∈ [0, 1]), but in some cases had instability [38 ]; boyan in 2002 further extended Least squares differential Learning (LSTD) to LSTD (λ). In the same year, based on the LSTD algorithm, La-goudakis and Parr propose a Least square Policy Iteration algorithm (LSPI) to obtain better stability; in 2006, Geramifardet et al proposed an incremental Least Squares Temporal Difference learning (iLSTD) algorithm, and an incremental Least Squares Temporal Difference learning with qualification tracking (iLSTD with identity tracks), and demonstrated the convergence of the algorithm; in 2008, Suttonet al proposed a GTD algorithm and proved that it could converge to a least square solution under a different strategy, but the convergence rate was much slower than that of the conventional algorithm; in 2009, Sutton et al was targeted by a new minimization: projection Belman Mean Square Error (Mean Square Projected Belman Error, MSPBE) deduces two novel algorithms with milestone significance, GTD second generation (GTD2) and TDC, thereby greatly improving the convergence rate of GTD; in 2010, the method in which Scherrer proposed a time series Difference (TD) motionless point solution and a bellman (bellman) residual is actually both oblique projections to a true value function, and neither solution is an optimal solution; in 2015, Liu et al analyzed the convergence of GTD algorithms under the off-strategy by adopting a finite sample analysis method, obtained a real random gradient TD algorithm through an original-dual saddle point objective function, provided projection GTD2 and GTD-MP algorithms, respectively improved the convergence and speed, and provided a performance boundary [45] under two conditions of the strategy and the off-strategy; in 2018, Liu et al propose a GTD2-MP algorithm, and the convergence rate of the algorithm is improved. The method comprises the steps of predicting the income of miners in the next round of the miner game by using a TD (lambda) algorithm, and then selecting and executing behaviors with higher income by using a greedy strategy.
In a bitcoin mine, a process of calculating a designated nouce value (commonly called mining) assigned by a mine administrator by a miner consumes a certain amount of computing power, and the amount of computing power consumed by the miner is assumed to be c (c is more than 0). When a plurality of miners select cooperative excavation, the probability of calculating the final value of nouce is greatly increased, that is, the expected profit value of each miner is larger than that obtained when the miners singly perform excavation. Assuming that miners dig together, the expected revenue of the system is expanded by a factor of r, which is a value greater than 1. In the mine, the system distributes the mining benefits of the mine according to the calculation power of miners. For miners who launch block catch attacks, the system will distribute their profits as well as the power of the miners.
Suppose there are two miners in pool P, miner X and miner Y respectively, and the game between miners can be viewed as a Markov decision process. Two miners can independently decide whether to launch the block interception attack, and the behavior of the miners at the moment is defined as attach (A) and vice versa is defined as Cooperate (C).
Assuming that the total computing power of the ore pool is 1 and the total computing power of the miners X is t (c is more than t and less than 1), the computing power of the miners Y is 1-t. At this time, if it is assumed that the total profit of the mine pool is 1 when two miners collaborate simultaneously, the calculation power can be simply used to represent the profit. Thus, the income tables of miners under different choices are obtained:
TABLE 1 Miner's income Table
Figure BDA0002129308730000111
When both the two strategies are selected to cooperate, namely the strategy combination of the two strategies is (C, C), the miners consume a certain calculation force C, but the income of the whole system is increased by r times, and the income of the miners X and Y is r (1-C) and r (1-t-C) respectively; when the strategies of the two miners are combined into (C, A), namely the miners X select cooperation and the miners Y select attack, the benefits of system distribution can be obtained without paying the calculation force C by Y, so that the system is doubleThe square profit is t2-c and t (1-c); when the combination of the strategies of the two miners is (A, C), the benefits of the miners are expressed as t (1-t) and (1-t)2-c; when both choose to attack, the effective computing power of the mine pool is 0, so the gains of both parties are 0 and 0.
Obviously, from the revenue table it can be known that: the values of t, c and r will influence the strategy selection of the miners. These values are analyzed briefly as follows:
1) when r (t-c) > t (1-t), t2The best strategy for miners when c < 0 is: when his opponents cooperate he also cooperates; when his opponent attacks he also attacks;
2) when r (t-c) < t (1-t), t2When-c > 0, the optimal strategy for miners is: his opponents attack when he collaborates; he collaborates when his opponent attacks;
3) when r (t-c) > t (1-t), t2The best strategy for miners when c < 0 is: he chooses to collaborate whether his opponent attacks or collaborates;
4) when r (t-c) < t (1-t) t2When-c > 0, the optimal strategy for miners is: whether he is attacking or collaborating with his opponent, he chooses to attack;
from the above analysis, it can be seen that the fourth case is the case faced by prisoner's predicament. This means that miners based on rational analysis will eventually attack each other, reducing the profit of the mine to 0.
Through the above section, the behavior space of the dual miner model can be obtained:
B=[C,A]
in addition, the gambling state space and the benefit vector of the model can be summarized. Obviously, for the game state space of the dual miner model, four states can be obtained:
W=[CC,CA,AC,AA]
while the revenue vectors for miners X and Y may be expressed as:
SX=[r(t-c),t2-c,t(1-t),0]T
SY=[r(1-t-c),t(1-c),(1-t)2-c,0]T
when the computational power of two miners is the same, the profit vectors for X and Y can be represented by R, S, T, P:
Figure BDA0002129308730000121
Figure BDA0002129308730000122
the strategy probability for miner X may be expressed as:
p=[p1,p2,p3,p4]
the strategy probability for miner Y may be expressed as:
q=[q1,q2,q3,q4]
p and q represent the state transition probabilities of mineworker X and Y in the next state selection cooperation, respectively, where the sequence of the previous state corresponding to the subelement coincides with the sequence in W. For example, when the last game space state is (C, C), i.e. both choose to cooperate, the next state miner X chooses the probability of cooperation to be p1, and the choice attack probability to be1-p 1; the miners Y choose the probability of cooperation as q1 and the probability of attack as 1-q 1. Wherein p, q ∈ [0,1]]。
Obtaining a Markov state transition matrix of the double-miner game according to the strategy probabilities p, q:
Figure BDA0002129308730000131
to make the meaning of matrix M more clear, the transition process matrix for M is given below:
Figure BDA0002129308730000132
the sub-elements of matrix T correspond to the sub-elements of M, which represent the state transition probabilities of the corresponding sub-elements in T.
In the Markov matrix M in the double-miner game model, M has a unique characteristic value, so that a unique M' ≡ M-I can be obtained, wherein I represents an identity matrix. Steady state vector V for MTOr any vector proportional to M, can result in:
VT·M=VT(1)
VT·M′=0 (2)
according to Cramer's law there are:
Adj(M′)M′=det(M′)I=0 (3)
where Adj (M ') represents the companion matrix of M ', and combining (2) and (3) it can be seen that each row of Adj (M ') is proportional to VT. Select the fourth row of Adj (M'), corresponding to VTAfter line and column conversion, a V can be obtainedTAnd any quaternion vector f ═ f1, f2, f3, f4]TDot product of (a):
Figure BDA0002129308730000141
note that the second column in this matrix
Figure BDA0002129308730000142
And third column
Figure BDA0002129308730000143
As can be seen by looking at the elements of the two columns,
Figure BDA0002129308730000144
contains only the elements consisting of p,
Figure BDA0002129308730000145
contains only elements consisting of q, that is,
Figure BDA0002129308730000146
and
Figure BDA0002129308730000147
each may be unilaterally determined by miner X and miner Y.
The income vectors S of the miners X and YXAnd SYCarry-in (4), the expected returns of miners X and Y can be obtained:
Figure BDA0002129308730000148
Figure BDA0002129308730000149
where 1 is a 4-dimensional column vector with all elements 1.
As can be seen from (5) and (6), the expected profit of the miners is linearly dependent on the profit vector of the miners. Therefore, such a linear relationship exists in the mine
Figure BDA00021293087300001410
Wherein the molecule is:
Figure BDA00021293087300001411
denominator:
Figure BDA00021293087300001412
Figure BDA0002129308730000151
due to the fact that
Figure BDA0002129308730000152
And
Figure BDA0002129308730000153
can be determined by the miners X and Y in a unilateral way, so that the miners X and Y can both make the (7) disappear in a unilateral way. Specifically, the miners X may set
Figure BDA0002129308730000154
Or the miners Y can set
Figure BDA0002129308730000155
In this case, the determinant would be equal to 0, the linear relationship of miners X and Y with respect to expected revenue:
αEX+βEY+γ=0 (8)
in the zero-determinant strategy, if miner X sets his strategy:
Figure BDA0002129308730000156
that is, α in (8) is set to 0, then equation (8) becomes:
βEY+γ=0 (10)
and (10) obtaining the expected income of the miners Y:
Figure BDA0002129308730000157
in addition, four equations for P can be obtained by (9):
p1-1=βR+γ
p2-1=βT+γ
p3=βS+γ
p4=βP+γ
from the above four equations, β and γ can be eliminated to obtain p2And p 3:
Figure BDA0002129308730000158
Figure BDA0002129308730000159
by this alternative, one can convert (11) into:
Figure BDA0002129308730000161
in the double minerIn the game, the profit P is 0. Analyzing equation 16, when miner X uses the equalization strategy
Figure BDA0002129308730000162
When the method is used, the method can control the expected income range P of the miner Y to be less than or equal to E by unilaterally adjusting the values of P1 and P4Y≤R。
In the knockout strategy, a knockout factor χ is introduced. Miner X may set his policy vector
Figure BDA0002129308730000163
Wherein x is more than or equal to 1, and phi is a free parameter. Similarly, by (15), the modification of (8) can be obtained
Φ[(EX-P1)-χ(EY-P1)]=0 (16)
From (15) again, four equations can be derived:
p1=1+Φ(1-χ)
Figure BDA0002129308730000164
Figure BDA0002129308730000165
p4=0 (17)
since p is [0,1], the value range of phi can be obtained from the second and third formulas
Figure BDA0002129308730000166
Figure BDA0002129308730000167
The income in the double-miner game model satisfies T > R > 0 > S, and phi is more than or equal to 1, so that the value range of phi can be obtained
Figure BDA0002129308730000168
Under this kind of fraud strategy, the expected profit for miner X is dependent on the policy vector q for miner Y, only if the policy vector q for miner Y is [1,1,1 ═ 1]TIn time, the miner Y cooperates comprehensively, and the two parties can obtain the maximum profit value. If the miners Y decide to collaborate omnidirectionally, the expected revenue for both parties can be expressed as:
Figure BDA0002129308730000171
Figure BDA0002129308730000172
further analysis shows that when both behaviors are Attack, i.e., the (A, A) state, both profits are 0, so that P-0 can be substituted into (16) to obtain:
EX=χEY (22)
therefore, it is easy to know that when the miner X uses the phishing strategy, the miner X can control the expected income of the miner X to be in a linear relation with the expected income of the opposite party, and can guarantee that the income of the miner X is X times of the income of the opposite party forever by adjusting the phishing factor X, so that the purpose of the phishing miner Y is achieved.
In bitcoin mines, the computing power of two miners is not sufficient. In practice, there are hundreds of miners even in the smallest mine. Therefore, it is necessary to extend the dual-mineworker model to the multi-mineworker model.
Due to the complexity of multiparty gaming, only the case of the same miner effort is discussed for the time being herein. Assuming that the total calculated power of a mine pool with N miners is N, the calculated power of each miner can be represented by 1. Thus, the assigned revenue of a miner can be represented using computational power as simply as a dual miner model. As with the dual miner model, when a miner chooses to collaborate, the computational power he consumes is represented by c (c > 0). Also, when a plurality of miners jointly excavate the mine, the profit of the entire mine pool becomes large. And in the dual miner model, the gain expansion factor is also denoted by r. But there is a problem here: as the number of miners who excavate together increases, the probability of success of the excavation also increases. That is, the value of r should become larger as the number of collaborators increases, and its growth curve should be gradually smoothed. To solve this problem, r ═ ln (k + b) is defined, where k is the number of collaborators and b is a constant.
Due to the high degree of unity of the N miner's revenue composition, it is difficult to present a chart like in the dual miner model. But by the above definition, the existence condition of miner predicament in the multi-miner game can still be found.
According to the above definition, for any miner in a mine pool with N miners, his cooperation and attack benefits can be represented by the following:
collaboration:
Figure BDA0002129308730000181
attack:
Figure BDA0002129308730000182
where n represents the number of collaborators in his opponent. Reviewing the dual miner model, it can be seen that: miners are trapped in the sense that whether the miners' opponents choose to attack or collaborate, his best strategy is to attack. In a multiple miner game, this situation can be expressed as:
Figure BDA0002129308730000183
solving (23), the effective range of the multi-miner predicament can be obtained:
Figure BDA0002129308730000184
where b and c are two constants, meaning that the range of valid values for N changes as N changes. Analysis (24) shows that the right side of the inequality is an increasing function of N, and if the maximum value of N is substituted, a certain range of N can be found. According to the assumption that nmaxN-1. A new inequality can then be obtained:
Figure BDA0002129308730000185
through a simple variant, one can obtain:
ecN>N+b(26)
next, two functions are constructed f1(N)=ecNAnd f2(N) N + b, according to the definition, 0 < c < 1, b ≧ e-1. Finally, the images of these two functions are plotted in fig.1 to find the valid range. From FIG.1, it is known that there is NiE (0, + ∞) satisfies
Figure BDA0002129308730000191
And when N is>NiWhen e is presentcN>N+b。N>NiIt is the valid range of N.
If N miners exist in the mine pool and cannot communicate with each other, the miners independently determine whether to initiate block interception attack or not. At the moment, the behavior space of N miners is the same as that in the game of the double miners:
B=[C,A]
the behavior of the current round of miners is assumed to be determined only by the state of the previous round and has no relation with the previous round, so that the repeated game among miners can be regarded as a Markov chain. In the N inter-mine games, 2 games will appear in each roundNOne possible state, for example when N is 3, the mineworker' S gambling state space may be represented as S ═ CCC, CCA, CAC, CAA, ACC, ACA, AAC, AAA]. When N is too large, the game state space will have difficulty giving a specific expression, and therefore, siTo refer to a specific beatThe playing state is as follows:
S=[s1,…,si,…,s2N]
for any miners in the current mine pool, he will have a policy vector:
Figure BDA0002129308730000192
wherein
Figure BDA0002129308730000193
The probability that the miners select cooperation in the current round in the previous round of game state of the game state space corresponding to the subscript is referred to. In particular, the present invention relates to a method for producing,
Figure BDA0002129308730000194
the final game result in the previous round is represented as siIn the case of (2), the miners select the probability of cooperation at the current turn. Still taking three miners as an example, where the revenue vector for miner 1 may be expressed as:
Figure BDA0002129308730000195
for convenience of presentation, p may be further refinedxIs described in (1). For miner 1, in a pool game state, participants can be divided into two pieces of marketing of my party and the enemy, wherein the i.e. the miner 1, and the enemy, i.e. N-1 other miners. The behavior of our party and the behavior of the enemy form a game state, and the state of the previous round can be represented in a strategy vector p under the condition of knowing how many enemy miners select cooperation in the previous roundxIn (1). For this purpose, define pC,n(or p)A,n) Indicating the probability of my choosing collaboration in the previous round, in the case of my Cooperate (or Attack) and n enemy miners Cooperate. Thus, the strategy vector of miner 1 can be expressed as
Figure BDA0002129308730000201
For example, when a miner 1 is randomly taken from a pool with three miners, his policy vector can be expressed as:
Figure BDA0002129308730000202
the ore pool of N miners is composed of 2NA game state such that each miner has a possession of 2NA revenue vector for each sub-element. Assume the miners' revenue vector is:
Figure BDA0002129308730000203
wherein x is ∈ [1, N ∈ >]. Considering the definition in the previous section, one can obtain
Figure BDA0002129308730000204
Expression (c):
Figure BDA0002129308730000205
where n (i) represents the number of cooperations of an enemy miner in game state si;
Figure BDA0002129308730000206
as an indicator, the state s of playing chessiThe behavior of the following party, if the party selects Cooperate, then
Figure BDA0002129308730000207
On the contrary, the method can be used for carrying out the following steps,
Figure BDA0002129308730000208
also, given the revenue vector for miner 1 in a mine with three miners:
Figure BDA0002129308730000209
here, a fixed value r is used for convenience of calculation instead of r ═ ln (k + b) in the definition. This change does not affect the derivation.
Next, a Markov state transition matrix for multi-miner gaming is defined:
Figure BDA00021293087300002010
wherein M isi,jRefers to the transition probability of the mine transitioning from state i to state j. According to the definition of Markov state matrix, Mi,jCan be calculated using the following equation:
Figure BDA00021293087300002011
wherein x represents all miners, further refined:
Figure BDA0002129308730000211
where n (i) represents the number of collaborators in the opponent in state i;
Figure BDA0002129308730000212
refers to the action of my party at state j. If my party chooses to collaborate, then
Figure BDA0002129308730000213
If not, then,
Figure BDA0002129308730000214
by the above definition, the Markov state transition matrix for a mine pool with three miners is given later.
If M is a conventional state transition matrix, then it must have a unique steady-state vector V as does the Markov matrix for the double-miner gameTHere, a steady-state vector V with unique characteristic values is takenTTo obtain
VT·M=VT(27)
Now, redefine:
M′=M-I (28)
namely:
Figure BDA0002129308730000215
wherein deltai,jIs Kronecker delta, and the specific expression is as follows:
Figure BDA0002129308730000216
then to Mi,jMaking some basic row-column transformations [6]The probability vectors can be separated to obtain a determinant that can be unilaterally controlled by the strategy of any miner. Define the mineworker x policy after separation as
Figure BDA0002129308730000217
Figure BDA0002129308730000218
Similar to the double miner game, applying the Cramer rule to M' can result in
Adj(M′)M′=det(M′)I=0 (30)
Meanwhile, combining (27) and (28) can result in:
VT·M′=0 (31)
comparing (30) and (31), it is apparent that each row of Adj (M ') (i.e., the adjoint of M') is associated with a steady-state vector VTAnd (4) in proportion. Thus, for an N-dimensional benefit vector ux:
Figure BDA0002129308730000221
Wherein m'i,jIs denoted by M'j,iA sub-element of (1). Using a minimum of MU in the next row replacement (32)xIt is possible to obtain:
VT·ux=det(p1,…,px,…,pN,ux) (33)
wherein det (p)1,…,px,…,pN,ux) Is a2N×2NDeterminant, for ease of understanding, the V of the three miner game is given in FIG. 3T·u。
By using (33), V can be adjustedT·uxLast column u ofxCalculation of expected profit E for any miners by Laplace expansionxThus, there are three miners 'game Markov state transition matrix and V under three miners' gameT·u:
Figure BDA0002129308730000222
Figure BDA0002129308730000223
Figure BDA0002129308730000224
Where 1 is a vector with all elements 1, similar to that in the double miner game.
Analysis (34) shows that the expected income of the miners is linearly related to the income vectors of the miners, and similar to a double-miner game, a linear combination of the expected income of all the miners can be obtained and is represented by the following equation:
Figure BDA0002129308730000231
from the previous analysis, the presence of matrix M' is known
Figure BDA0002129308730000232
Can be determined unilaterally by a miner. FalseThe miner is set as the miner 1, and the miner 1 is set
Figure BDA0002129308730000233
Then, (35) returns to 0:
Figure BDA0002129308730000234
strategy vector of miner 1 at the moment
Figure BDA0002129308730000235
The method is a zero determinant strategy under a multi-miner game.
In the equalization strategy, the goal of miner x is to unilaterally decide the sum of the expected benefits of the enemies using a certain strategy. Analysis (36) when α is1When the sum is equal to 0, the expected income sum of the enemy miners is obtained
Figure BDA0002129308730000236
Then alpha is converted intoxSet to a fixed value u, u can be taken out of the summator. Any miner 1 in the multi-miner game may set the equalization policy at this time:
Figure BDA0002129308730000237
under this strategy, (36) becomes:
Figure BDA0002129308730000238
the (38) may also be modified to obtain a sum of expected returns from the enemy
Figure BDA0002129308730000239
That is, miners can control the expected sum of profits of enemy miners as long as the values of beta and u are set.
Let it return to the mineBalance strategy of worker x
Figure BDA00021293087300002310
Figure BDA00021293087300002311
Originally is one 2NThe column vector, through the linear relationship of (39), also yields 2NEquation, 2NAn equation again corresponds to 2NThe elements in a column vector, i.e. pC,nAnd pA,n,
Wherein N is an element of [0, N-1 ]]. These 2 are given belowNThe expression of this equation:
Figure BDA0002129308730000241
Figure BDA0002129308730000242
according to (40), the probability parameter pC,nAnd pA,nCan be determined by two parameters β and u. The reverse is also true if two probability parameters are determined, so that the values of β and u can be determined, then 2 elseNA probability parameter can also be determined by (40). Here, two more important parameters are selected, pC,N-1,pA,0I.e., the probability of full cooperation and the probability of full attack. The expressions for these two probabilities are given below:
pC,N-1=1+u(N-1)(r-c)+β
pA,0=β (41)
from the above two equations, expressions of the parameters u and β can be obtained respectively
Figure BDA0002129308730000243
β=pA,0 (42)
By substituting (42) into (39), a probability parameter p for the mineworker 1 can be obtainedCN-1 and pA,0Expected income sum expression of enemy miners
Figure BDA0002129308730000244
As can be seen from the above formula, once the miners 1 use the equilibrium strategy, the expected income sum of the enemy miners can be determined by the total number N of miners in the mine pool, the income coefficient r, the mining consumption calculation power c and two probability parameters pC,N-1And pA,0And (6) determining. Since the three parameters N, r and c are fixed values, and pC,N-1And pA,0Is determined by the miner 1 on a single side, so that the miner 1 can set different pC,N-1And pA,0To control the expected total gain of the enemy.
In the fraud strategy, the goal of miner 1 is to want to use a strategy as in the double miner game so that my revenue is χ times that of the enemy revenue. Similarly to the strategy in the double miner game, the miners 1 are set to execute the following strategy:
Figure BDA0002129308730000251
wherein P refers to the income when all miners in the mine are in attack state, and the income vector expression of the miners is reviewed
Figure BDA0002129308730000252
It can be seen that in the full attack state, the profit for each miner is 0, so (44) can be written as:
Figure BDA0002129308730000253
the policy vector represented by (45) can obtain the relational expression between the expected income of the miners 1 of the same party and the expected total income of the miners of the enemy party
Figure BDA0002129308730000254
Can be obtained by simple deformation
Figure BDA0002129308730000255
It can be seen from (47) that when the miners use the fraud strategy, the miners can control the expected profit of our party to be always times as much as the expected total profit of the enemy party by unilaterally setting the value of the fraud factor χ.
Analysis (47) shows that the income of the party reaches the highest value when miners in the mine pool are in a full cooperation state. The mine pool can reach a full cooperation state through the combination of a time sequence difference algorithm and a zero-determinant knocking strategy.
According to (47), the phishing factor χ is defined as: the income of the our party is multiplied by the sum of the income of the miner of the enemy. If the fixed value chi is adopted, the higher income of the user can be ensured, but the situation is not favorable for the mine pool to reach the full cooperation state. Based on this consideration, χ is set to be the dynamic casualty factor:
Figure BDA0002129308730000256
wherein P isCRepresenting the probability of cooperation for the entire mine. When the cooperation probability of the whole ore pool is small, the value of chi is improved, and it is guaranteed that the high income can be obtained by the user; when the cooperation probability of the whole ore pool is larger, the value of the x is reduced, and the ore pool is forced to be converted to a full-cooperation state; when the mine is in full cooperation, i.e. PCAt convergence of 1, the disqualification factor χ will evolve to a constant, continuously maintaining the full cooperation state of the mine.
In practice, the number of miners in a mine is large, the state space number and the expected income complexity of the mine also increase linearly with the number of miners, and the definite expression of an expected income function is extremely difficult to be given, so that the mine in a full cooperation state is directly used for simulating the income of the mine, namely px=1 (x∈[2,N]) Simulating the cooperation probability P of the whole ore pool by the state of the minerCThe following gives the expression of the return function of my and enemy:
Figure BDA0002129308730000261
Figure BDA0002129308730000262
in the t-th round, with Eadp(t) represents the earnings of my party, Eadp(t) represents the sum of the gains of the enemies, and simultaneously, the cooperative gains and the attack gains of the enemies are respectively expressed by Ecoo(t) and Eatt(t) represents. Thus, in the t +1 round, the expected profit formula for my party can be expressed as:
Ecoo(t+1)=VC(t+1)+E1
Figure BDA0002129308730000263
Figure BDA0002129308730000264
Figure BDA0002129308730000265
wherein the probability of cooperation PCAnd probability of attackP AAre each x ∈ [0,1]]. The next round of policy selection decides the policy by comparing the magnitude of the gains of attack and cooperation:
1) if E iscoo(t+1)>Eatt(t +1), the miner 1 selects a cooperation strategy. At the same time, the overall cooperation probability P of the next round of mine poolC(t+1)=PC(t)+F(PC(t +1)), probability P of the overall selection attack of the mine poolA(t+1)=PA(t)-F(PA(t+1)).
2) If E iscoo(t+1)<Eatt(t +1), the miner 1 selects an attack strategy. At the same time, the overall cooperation probability P of the next round of mine poolC(t+1)=PC(t)-F(PC(t +1)), probability P of the overall selection attack of the mine poolA(t+1)=PA(t)+F(PA(t+1))。
3) If E iscoo(t+1)=Eatt(t +1), the miner 1 selects a cooperation strategy. At the same time, the overall cooperation probability P of the ore pool of the next roundC(t+1)=PC(t)+F(PC(t +1)), the overall probability of attack for the mine remains unchanged.
Where the fermi function F (epsilon) can be expressed as:
Figure BDA0002129308730000271
the disqualification factor χ follows P in the iterative process of the algorithmCWill vary, which will affect the revenue function. In the long term, based on rational economics assumptions, enemy miners will eventually realize that cooperation is his best strategy, so that the number of rounds after that selects cooperation. When the iteration times are enough, the overall cooperation probability of the mine pool is converged to 1, namely, a full cooperation state is reached.
In order to test the effectiveness of the application, a cooperative probability evolution process in a three-mine pool under a scale-free network is simulated. In addition, a comparison experiment is set, and the number of iteration rounds required for reaching the full cooperation state is tested by using different initial cooperation probabilities. The data of the first 40 game rounds are intercepted, attack and cooperation benefits of each round are predicted, and cooperation probability is changed according to the benefits of each round. Then, the number of cooperative probability convergence rounds is compared with the adaptive strategy.
The first 20 rounds of cooperative probability evolution are shown. As shown in fig. 2, as the number of round iterations increases, the overall cooperation probability of the mine tends to increase. Meanwhile, the larger the initial cooperation probability, the fewer the number of iteration rounds required for convergence. After 6 rounds, the total probability of cooperation converged to 1.
The strategy is shown in fig. 3 in comparison with the number of rounds required for convergence of the adaptive strategy. It can be seen that the number of rounds required for the strategy to converge is 1 to 3 rounds faster than the adaptive strategy at the selected 4 initial cooperation probabilities.
The evolution of the miners' revenue at different initial cooperation probabilities is shown in fig. 4-8. As shown, the cooperative gain is always higher than the attack gain, so any rational miners will always choose to cooperate. Note that if the data accuracy is improved, the cooperation yield does not converge until the cooperation probability converges to 1. In addition, the attack gains have different convergence values under different initial cooperation probabilities. Fig. 6 is a benefit evolution process with an initial cooperation probability of 0.1, and the attack benefit has a tendency to converge towards the middle, and finally converge in the 139 th round. Fig. 7 is a revenue evolution flow for which the cooperation probability is 0. Likewise, the attack gains had a tendency to converge towards the middle and finally converge at round 35. The convergence probability of fig. 8 is 0.5, and convergence is completed in round 11. Note that the smaller the initial cooperation probability, the larger the number of convergence rounds required for the attack revenue. In addition, as the probability of cooperation converges, the overall pool revenue increases and the throughput of the bitcoin system increases.
The invention creates models of two-party and multi-party games in a pool game environment based on a workload certification mechanism (PoW), and tries to provide a new idea for solving the problem of block interception attack in a bitcoin pool by introducing a zero determinant strategy. Analyzing game conditions among miners, regarding the game among the miners as an iterative prisoner dilemma, establishing a game model of double miners and multiple miners, using a zero-determinant strategy in the game, predicting the income of the next round by using a time sequence difference algorithm, adopting a greedy strategy, selecting the behavior of the next round and changing the overall cooperation probability of the ore pool. And (3) performing iterative execution of the time sequence difference algorithm, and finally converging the overall cooperation probability of the ore pool to 1, namely achieving the full cooperation state of the ore pool and solving the problem of block interception attack.
In addition, the game among the mines can also be regarded as a multi-party game model. The mine pool can submerge into other mine pools by utilizing the miners of the mine pool, and block interception attack is initiated on the other mine pools. The revenue captured by the submerged miners in the other mine ponds serves as the additional revenue for themselves. In general, the effective mining power of the attacked mine pool is unchanged, but the total income is distributed among more miners (including the original miners and the miners submerged by other pools), so that the income of all the miners is reduced. Attackers may gain additional gains because some miners perform block trapping attacks, resulting in reduced computing power, but submerging in other mine pools. Subsequent work analyzes the inter-mine-pond game intruding into other mine ponds by the aid of the distributed computing power, establishes a multi-mine-pond game model, tries to optimize the model by means of a multi-party game strategy and a reinforcement learning algorithm, and improves cooperation probability among the mine ponds.

Claims (5)

1. A block chaining coin mine pool multi-miner cooperative evolution method based on a time sequence difference algorithm is characterized by comprising the following steps:
step one, determining a strategy vector of each miner in next round of operation according to the strategy condition of cooperation or attack adopted in each round of operation based on game among the miners by using N miners in an ore pool, and simultaneously obtaining a profit vector of each miner;
step two, obtaining the strategy probability of each miner when the strategy is selected according to the strategy vector of each miner, and obtaining a Markov state transition matrix under the multi-miner game condition based on the strategy probability, namely a strategy selection change matrix;
performing row-column transformation on the Markov state transition matrix to obtain a determinant which is unilaterally controlled by a strategy of a single miner, and combining the income vector of the miner to obtain the expected income of the miner so as to obtain a linear combination of the expected income;
step four, according to the linear combination of expected profits, according to the probabilities of cooperation and attack, adjusting the strategy adopted by the miners to control the expected profits range of other miners, introducing a factor related to the profits vector weight and used for expressing the multiple between the profits of the miners and the total profits of other miners as a fraud factor, setting the fraud factor as a dynamic change value, when the cooperation probability of the whole mine pool is lower, improving the value of the fraud factor to ensure that the miners can obtain high profits, and when the cooperation probability of the whole mine pool is higher, reducing the value of the fraud factor to force the mine pool to be converted to a full cooperation state; when the mine is in a fully cooperative state, the knock-out factor will evolve to be constant to continuously maintain the fully cooperative state of the mine, thereby maximizing the revenue of the entire mine.
2. The method for the cooperative evolution of the multi-miner in the blockchain coin mine pool based on the time sequence difference algorithm as claimed in claim 1, wherein in the step one, the attack is the block interception attack, and the cooperation is the attack without the block interception.
3. The method for the cooperative evolution of the multi-miner in the blockchain coin mine pool based on the time sequence difference algorithm as claimed in claim 1, wherein in the step one, the strategy vector p of the miner 1 is1Comprises the following steps:
Figure FDA0002129308720000021
wherein p isC,nThe probability that the miners 1 adopt cooperation in the previous round and select cooperation in the current round under the condition that n other miners cooperate is shown, pA,nShowing that the miner 1 adopts the attack in the previous round and the miner selects the probability of cooperation in the current round under the condition that N other miners cooperate, wherein N is the total number of the miners in the mine;
the yield vector of the miners is:
Figure FDA0002129308720000022
wherein x is ∈ [1, N ∈ >],
Figure FDA0002129308720000023
Where n (i) denotes in game state siThe number of other miners collaborating;
Figure FDA0002129308720000024
as an indicator, to indicate at game state siThe behavior of the miner is judged, if the cooperation is selected by the miners
Figure FDA0002129308720000025
If not, then,
Figure FDA0002129308720000026
4. the block chaining coin and ore pool multi-miner cooperative evolution method based on time sequence difference algorithm as claimed in claim 1, wherein in the second step, Markov state transition matrix is
Figure FDA0002129308720000027
Wherein M isi,jAnd the transition probability of the ore pool from the state i to the state j is indicated, and N is the total number of miners in the ore pool.
5. The method for cooperative evolution of blockchain coin mine ponds and miners based on time sequence difference algorithm as claimed in claim 1, wherein in the third step, the linear combination of the expected earnings of miners is
Figure FDA0002129308720000028
CN201910632888.3A 2019-07-14 2019-07-14 Block chaining coin mine pool multi-miner cooperative evolution method based on time sequence difference algorithm Pending CN112232844A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910632888.3A CN112232844A (en) 2019-07-14 2019-07-14 Block chaining coin mine pool multi-miner cooperative evolution method based on time sequence difference algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910632888.3A CN112232844A (en) 2019-07-14 2019-07-14 Block chaining coin mine pool multi-miner cooperative evolution method based on time sequence difference algorithm

Publications (1)

Publication Number Publication Date
CN112232844A true CN112232844A (en) 2021-01-15

Family

ID=74111781

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910632888.3A Pending CN112232844A (en) 2019-07-14 2019-07-14 Block chaining coin mine pool multi-miner cooperative evolution method based on time sequence difference algorithm

Country Status (1)

Country Link
CN (1) CN112232844A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113114492A (en) * 2021-04-01 2021-07-13 哈尔滨理工大学 Security situation perception algorithm based on Markov differential game block chain model
CN114401099A (en) * 2021-08-17 2022-04-26 同济大学 Block chain PoW selfish mining attack defense method based on network topology
US20230325813A1 (en) * 2022-03-28 2023-10-12 Daniel Joseph Lutz System and Method for Mining Crypto-Coins

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113114492A (en) * 2021-04-01 2021-07-13 哈尔滨理工大学 Security situation perception algorithm based on Markov differential game block chain model
CN114401099A (en) * 2021-08-17 2022-04-26 同济大学 Block chain PoW selfish mining attack defense method based on network topology
US20230325813A1 (en) * 2022-03-28 2023-10-12 Daniel Joseph Lutz System and Method for Mining Crypto-Coins

Similar Documents

Publication Publication Date Title
Chen et al. Multi-population differential evolution-assisted Harris hawks optimization: Framework and case studies
Zhang et al. Chaos-induced and mutation-driven schemes boosting salp chains-inspired optimizers
Lin et al. Multi-agent inverse reinforcement learning for certain general-sum stochastic games
CN112232844A (en) Block chaining coin mine pool multi-miner cooperative evolution method based on time sequence difference algorithm
Meng et al. The effect of multi-step methods on overestimation in deep reinforcement learning
Cardoso et al. Competing against nash equilibria in adversarially changing zero-sum games
CN111245857B (en) Channel network steady state evolution game method in block link environment
Aguilar Adaptive random fuzzy cognitive maps
Shang et al. An adaptive consensus method based on feedback mechanism and social interaction in social network group decision making
Zhang et al. Strategy competition dynamics of multi-agent systems in the framework of evolutionary game theory
Aguilar Dynamic random fuzzy cognitive maps
Clempner et al. Convergence analysis for pure stationary strategies in repeated potential games: Nash, Lyapunov and correlated equilibria
Zhu et al. Strategy optimization of weighted networked evolutionary games with switched topologies and threshold
Hao et al. Achieving socially optimal outcomes in multiagent systems with reinforcement social learning
Żychowski et al. Addressing expensive multi-objective games with postponed preference articulation via memetic co-evolution
CN116187787A (en) Intelligent planning method for cross-domain allocation problem of combat resources
Zhang et al. Multi-agent system application in accordance with game theory in bi-directional coordination network model
Hu et al. Solving the crowdsourcing dilemma using the zero-determinant strategies
Dhakal et al. Evolution of cooperation and trust in an N-player social dilemma game with tags for migration decisions
Sun et al. Equalized grey wolf optimizer with refraction opposite learning
Wu et al. Aggressive q-learning with ensembles: Achieving both high sample efficiency and high asymptotic performance
Peleteiro et al. Emerging cooperation in the spatial IPD with reinforcement learning and coalitions
Fakhar et al. Causal influences decouple from their underlying network structure in echo state networks
CN116167723A (en) Multi-party-camp game weapon equipment development planning strategy selection method and system
CN114792187B (en) Crowd sensing team recruitment method and system based on willingness and trust double constraint

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination