CN109711176B

CN109711176B - Q-Learning intelligent contract effectiveness detection method

Info

Publication number: CN109711176B
Application number: CN201811515288.0A
Authority: CN
Inventors: 王伊蕾; 张利锋; 李凤银
Original assignee: Qufu Normal University
Current assignee: Lianyungang Micro-Painting Hall Network Technology Co.,Ltd.
Priority date: 2018-12-12
Filing date: 2018-12-12
Publication date: 2023-08-25
Anticipated expiration: 2038-12-12
Also published as: CN109711176A

Abstract

The invention discloses an intelligent contract effectiveness detection method based on Q-Learning, which aims to change the current situation that most of intelligent contract detection is based on program code loopholes at present and meet the requirement that the intelligent contract is used as a block chain for practical application. The technical key point is that the data feeding process in the intelligent contracts on the blockchain is regarded as a random distribution process, the utility of the intelligent contract participants is defined as a function of the random distribution, and a new intelligent contract effectiveness detection method is provided; and the parameters in the random distribution are optimized by utilizing Q-Learning, so that the purpose of optimizing the utility function of the participant is achieved. The method is quick and efficient, has high accuracy, high optimization strength and robustness, and is suitable for intelligent contracts as validity check and privacy protection of electronic contracts.

Description

Q-Learning intelligent contract effectiveness detection method

Technical Field

The invention belongs to the technical field of information security, and relates to a technology for optimizing intelligent contract effectiveness detection by using a Q-learning algorithm.

Background

Intelligent contracts exist themselves as a kind of program code that runs automatically on a blockchain, with many vulnerabilities. At present, the detection method for the validity (including correctness and fairness) of the intelligent contract is mostly based on program code vulnerability detection, and the research on the feeding of external data is less. Indeed, the execution of smart contracts depends to a large extent on the triggering of the feeding and conditions of external data. However, there is a large random factor in the external data, which is very critical to the execution and validity of the smart contract, so the validity detection problem for the smart contract with the external data feed is one of the problems to be solved.

These uncertainties can be seen as being made up of some distribution of random variables that are fed as data to the smart contract participants. They become conditions that trigger smart contracts, which can affect the operation of smart contracts, as they are an important component of the utility of the participants. In smart contracts, participants need to have sufficient economic incentive, that is, they only participate in the execution of the smart contract while remaining profitable. There is currently no work for participant motivation research in smart contracts, and there is also no efficient method to detect participant motivation. Thus, combining utility functions with random distribution of data feeds and using Q-Learning techniques to study the detection and optimization of smart contract effectiveness is one of the difficulties in current smart contract effectiveness detection techniques.

Disclosure of Invention

The invention aims to provide a method for detecting the validity of an intelligent contract based on Q-Learning, which is characterized by being realized through the steps of specific intelligent contract construction, data feeding, parameter optimization and the like, and comprises the following specific processes:

the first step: the intelligent contract initiator divides a file into s parts and encrypts each part, and if the whole file needs to be restored, each part of sub-file needs to be decrypted; the cost required for decryption is v, since the security document is time-efficient; i.e. its value is higher before the ciphertext is not published, but its value drops significantly after decryption; thus, it is necessary to use a random distribution to represent its decryption cost, but not a constant, and Weibull distribution can be used to represent the decryption cost (the value of the movie);

and a second step of: the external participant downloads s encrypted sub-files and tries to decrypt the files;

and a third step of: the smart contract randomly selects s 'random numbers from s, wherein s' < s; the smart contract sends s' to the smart contract initiator to let him decrypt the subfiles;

fourth step: if the smart contract initiator successfully decrypts the subfiles, the smart contract waits for an external data feed to continue execution; if the intelligent contract initiator successfully decrypts the subfiles, the other subfiles which are not decrypted can be proved to be correct; otherwise, the intelligent contract is terminated;

fifth step: each external participant needs to pay a partial amount m if he wants to decrypt the whole document>1, but this amount is less than the cost of decrypting the entire file; determining whether to pay m or not according to personal preference, and whether probability of payment is subject to binomial distributionWhere k is the total number of donations, n is the number of experiments, and p is each timeProbability of donation;

sixth step: the value of m is subject to random distribution, and the willingness of each participant to decrypt the whole file is different from the financial condition of the participant; because blockchain networks have small world properties, i.e., 80% of the wealth is concentrated in 20% of the people, m follows a Pareto distribution, which represents a majority of people (e.g., 80%) with a donation amount m between 1 and b;

seventh step: after each external participant decides whether to donate, the smart contract gathers these donations for km, which are fed into the smart contract program as data;

eighth step: the sponsor of the intelligent contract decides whether to decrypt the whole file according to the benefit, if km > v, the benefit of the sponsor is a positive value, and the sponsor has the motivation to decrypt the whole file; the larger the gap between km and v, the larger the motivation of the initiator; after the whole file is decrypted, all sum participants downloading the encrypted sub-file can obtain the decrypted file; note that sum > =k, that is, an external participant who does not pay m will sit and enjoy it, forming a pick-up phenomenon; this is determined by the smart contract because the initiator of the smart contract cannot identify the identity of the external participants, and can only treat them as a whole; if km < = v, it is stated that the intelligent contract sponsor is unprofitable, so he has no motivation to decrypt the entire file, he will choose not to decrypt;

ninth step: defining utility functions of the intelligent contract initiator and the external participants according to the distribution condition of the external data; the utility function of the intelligent contract initiator is: km-v, utility function of external participants is:where sum is the number of all outside participants,the value of 0 or 1,0 indicating that the external participant did not pay m,1 indicating that the external participant paid m;

tenth step: the above steps are repeatedly carried out, and the parameters p, b, a and c are optimized by using Q-Learning algorithm.

The method is quick and efficient, has high accuracy, high optimization strength and robustness, and achieves the following effects: and integrating the random distribution characteristics of the external feed data into the utility function of the intelligent contract participants, determining the parameters of each random distribution, and utilizing the Q-Learning optimization parameters to maximize the utility function of the participants and stabilize the motivation of the participants to execute the intelligent contract. The method is suitable for the intelligent contract as the validity check and privacy protection of the electronic contract.

Drawings

FIG. 1 details the algorithm flow of the smart contract and the relationships between various parameters.

FIG. 2 number of success of scaler at different donation rates

FIG. 3 number of success of scaler at different average donation intensities

Fig. 4 benefit distribution of the scaler at donation rate p=0.1

The specific embodiment is as follows:

the first step: state transitions in the process of contract circulation are implemented using a finite state automaton model a, and states include states= ['s1','s 2','s 3','s 4', ' Sfail ', ' Ssucc ', ' Sinc ', '. A is used to simulate a viewer joining a contract.

And a second step of: defining a conditional transfer function, including a donation willingness function d, a donation limit function m and a decoding failure function f; functions d and f are plotted using a sampling function of the bernoulli distribution with a mean value of q=0.1, whereas m is plotted using a pareto sampling function,where b=1, 2 is the Shape parameter in the Pareto distribution.

And a third step of: defining a cost function v of the film, wherein the function adopts index Weibull distribution exponweib;wherein parameter a is used to control the mean value, the larger a, the larger the sampling mean value; c is used to control the sampling variance, the more c isThe larger the variance, the smaller a=0.49 and c=1.9 in the test.

Fourth step: the state space, action space and gain function table of the Q learning algorithm are defined, and the Q table is defined.

Fifth step: the algorithm adopts greedy strategy to search the state and action table.

Sixth step: an environmental function interaction function step is defined, which completes one environmental interaction according to the observed state and action. In particular, the step function may check the donation status of the current movie, and if the donation amount is less than expected, the benefit is 0. Otherwise, benefit is donated. When the action indicates to continue waiting, the step function runs automaton A, the number of runs A using the point distributionThe intensity λ=200 was reached in the test.

Seventh step: and selecting an action according to the strategy function, transmitting the action to the environment function step, and observing state change, benefiting review and ending the calculation.

best_next_action = np.argmax(Q[next_state])

td_target = reward + discount_factor * Q[next_state][best_next_action]

td_delta = td_target - Q[state][action]

Q[state][action] += alpha * td_delta

And eighth step, updating the Q table by adopting a time difference technology. Firstly, searching the best action in the current state, multiplying the benefits of the state and the best action by a discount coefficient discrete_factor=0.9, and accumulating the discount coefficient discrete_factor=0.9 with the reward in the seventh step to obtain the benefits td_target as the current target.

Ninth step: the difference between td_target and the state of the Q table, and the action benefit is multiplied by the learning rate alpha=0.5, and the new value of the action benefit as the Q table state is updated to the Q table. The Q table segments are:

(0, 0): ( [ 0.005, 0. ])

(206, 299)([ 0.005, 0.])

(433, 695):( [0.005,0.])

(625, 1): ( 4, 1])

(1016, 574): ([ 0.005,0.],

(1191, 866)([ 0.005, 0. ])

(1398, 214): ([ 0.005, 0. ])

…

tenth step: repeating the fifth to ninth steps until a preset number of iterations is reached.

Eleventh step: and drawing a relation diagram of benefits and iteration time, modifying parameters, and observing the influence of the donation rate and donation limit parameters on the state arrival of A and on the benefits of the intelligent contract initiator.

The invention has the effect of verification

To demonstrate the effectiveness of the present invention, we studied the success times of the scaler in 1000 experiments at different donation rates, donation intensities. As can be seen from fig. 2, the instant scaler, under the guidance of Q learning, tends to learn the film delivery strategy optimal for him, but the increased donation rate of the participants does not guarantee that the scaler gets a higher success. Fig. 3 shows the different donation intensity conditions, the success rate of the obtainment of the scaler, and fig. 3 illustrates that the donation intensity has a significant influence on the success rate.

Fig. 4 shows that the average value of the benefits of the scaler is 56.9 in the case of a donation rate of 0.1, and fig. 4 shows that the benefits of the scaler are far lower than expected even in the case of continuous learning progress.

Claims

1. The method for detecting the validity of the intelligent contract based on the Q-Learning is characterized by being realized through specific intelligent contract construction, data feeding and parameter optimization steps, and comprises the following specific processes:

the first step: the intelligent contract initiator divides a file into s parts and encrypts each part, and if the whole file needs to be restored, each part of sub-file needs to be decrypted; the cost required for decryption is v, since the security document is time-efficient; i.e. its value is higher before the ciphertext is not published, but its value drops significantly after decryption; thus, it is necessary to use a random distribution to represent the decryption cost, but not a constant, and Weibull distribution is used to represent the decryption cost;

wherein, parameter a is used to control the mean value and parameter c is used to control the sampling variance;

fifth step: each external participant needs to pay a partial amount m if he wants to decrypt the whole document>1, but this amount is less than the cost of decrypting the entire file; determining whether to pay m or not according to personal preference, and whether probability of payment is subject to binomial distributionWhere k is the total number of donations, n is the number of experiments, and p is the probability of each donation;

sixth step: the value of m is subject to random distribution, and the willingness of each participant to decrypt the whole file is different from the financial condition of the participant; because the blockchain network has a small world nature, i.e., 80% of the wealth is concentrated in 20% of the people, m follows a Pareto distribution, which represents a donation amount m between 1 and b for 80% of the people;

eighth step: the sponsor of the intelligent contract decides whether to decrypt the whole file according to the benefit, if km > v, the benefit of the sponsor is a positive value, and the sponsor has the motivation to decrypt the whole file; the larger the gap between km and v, the larger the motivation of the initiator; after the whole file is decrypted, all sum participants downloading the encrypted sub-file can obtain the decrypted file; note that sum > =k; if km < = v, it is stated that the intelligent contract sponsor is unprofitable, so he has no motivation to decrypt the entire file, he will choose not to decrypt;

ninth step: defining utility functions of the intelligent contract initiator and the external participants according to the distribution condition of the external data; the utility function of the intelligent contract initiator is: km-v, utility function of external participants is:where sum is the number of all external participants, < +.>The value of 0 or 1,0 indicating that the external participant did not pay m, and 1 indicating that the external participant paid m;