CN111598127B - Block chain consensus method based on machine learning - Google Patents

Block chain consensus method based on machine learning Download PDF

Info

Publication number
CN111598127B
CN111598127B CN202010273144.XA CN202010273144A CN111598127B CN 111598127 B CN111598127 B CN 111598127B CN 202010273144 A CN202010273144 A CN 202010273144A CN 111598127 B CN111598127 B CN 111598127B
Authority
CN
China
Prior art keywords
nodes
node
request
supervision
data block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010273144.XA
Other languages
Chinese (zh)
Other versions
CN111598127A (en
Inventor
王海勇
郭凯璇
潘启青
张开心
管维正
刘贵楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202010273144.XA priority Critical patent/CN111598127B/en
Publication of CN111598127A publication Critical patent/CN111598127A/en
Application granted granted Critical
Publication of CN111598127B publication Critical patent/CN111598127B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/64Protecting data integrity, e.g. using checksums, certificates or signatures
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computational Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Pure & Applied Mathematics (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer And Data Communications (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention relates to a block chain consensus algorithm based on machine learning, which comprises the following steps: dividing all nodes in the block chain system into 3 different types of client nodes, main nodes and supervision nodes; each type of node has different functions; setting an algorithm process of a preparation stage and a confirmation stage; after the client node sends a data block production request, predicting the passing probability of the request; if the probability is greater than 0.7, the data block is produced in advance, and a new data block production request is directly sent after confirmation messages of 1+ Ns/2 (wherein Ns is the number of the supervision nodes) supervision nodes are received; if the probability is not greater than 0.7, the data block can be produced only after the confirmation messages of 1+ Ns/2 supervision nodes are received; a K-means clustering algorithm is pre-deployed in the whole block chain system, and when the number of nodes in the system increases or decreases, the algorithm is started to re-classify the increased nodes or adjust the category number difference caused by the decreased nodes. The data block is produced in advance through the prediction result of whether the data block production request passes, the time for waiting for the confirmation message is fully utilized, the steps are simplified, the time delay is reduced, and the efficiency is improved.

Description

Block chain consensus method based on machine learning
Technical Field
The invention relates to an improved research on a block chain classical PBFT algorithm, in particular to a block chain consensus algorithm based on machine learning.
Background
With the development of the blockchain technology, a consensus algorithm of the core technology is developed. In the existing consensus algorithm, the PBFT algorithm is a relatively classical distributed consensus algorithm, the PBFT algorithm is based on a state machine replication principle, consensus among nodes is completed and realized through three stages, namely a pre-preparation stage, a preparation stage and a confirmation stage, and in the process, messages of malicious nodes can be ignored through most honest nodes, and the algorithm can tolerate that no more than (n-1)/3 nodes are malicious nodes (where n is the total number of nodes). However, the PBFT algorithm cannot dynamically sense the change in the number of nodes, and is always in an idle waiting state after the client node sends a request, which causes a certain amount of resource waste, and the waiting time also increases the delay.
With the progress of research, many new consensus algorithms are continuously proposed by some experts and scholars, and the proving mode of the consensus algorithms tends to be diversified and mixed. The authorized Byzantine (dBBFT) algorithm combines the POS algorithm and the PBFT algorithm, is a consensus protocol for realizing large-scale participation through proxy voting, has the characteristics of rapidness, good expansibility and the like, but has limited node number, and the performance is reduced when the total number of the nodes exceeds a certain range.
The credit Byzantine protocol CPBFT (credit practical byzantine fault tolerance) partially improves the original PBFT and introduces a behavior integration and classification mechanism. The CPBFT determines the credit level of a node in the network by recording and evaluating node behavior. The low-credit-score nodes are deleted dynamically to improve the consensus efficiency and ensure the expandability and the dynamics of the system. However, credit evaluation would require additional communication procedures to reconcile the credit scores, wasting some broadband resources.
Through the analysis of the existing algorithm, especially the consensus algorithm of the distributed system: the PBFT algorithm, the dFT algorithm, the CPBFT algorithm and the like are necessary to provide a new improved algorithm aiming at the defects of the consensus algorithms, so that the algorithm provides a block chain consensus algorithm based on machine learning. Disclosure of Invention
The invention aims to: the invention aims to provide a block chain consensus algorithm based on machine learning, which effectively reduces time delay and improves efficiency.
The technical scheme is as follows: a block chain consensus algorithm based on machine learning comprises the following steps:
A. all nodes in the network are divided into 3 different types: the system comprises a client node, a main node and a supervision node; each type of node has a specific number, the number of client nodes can be set to be Nc, the number of main nodes is Nm, and the number of supervision nodes is Ns;
B. each type of node has a different functional setting: the client node is used for initiating a data block production request and producing a data block; the main node is used for numbering the requests of the client nodes, extracting the abstract and sending the processed requests to the supervision node; the supervision node is used for verifying the request and replying a confirmation message to the client node;
C. setting an algorithm process of a preparation stage and a confirmation stage;
D. a preparation stage: the client node initiates a data block production request to the main node, and the main node numbers the data block according to the time sequence of the received requests and extracts the abstract of the requests; then, sending the processed requests to a supervision node according to the serial number sequence;
E. and (3) confirmation stage: after receiving the request, the supervision node verifies whether the request is consistent with the abstract, namely Y, N, adds a self verification result (Y, N) to the message after verification, verifies the consistency, then sends the message to other supervision nodes to realize mutual confirmation and verification of the request, and replies a confirmation message to the client node after verification is passed;
F. after the client node sends a data block production request, a logistic regression algorithm is operated to predict the passing probability of the request; if the probability is greater than 0.7, the data block is produced in advance, and after receiving the confirmation messages of 1+ Ns/2 supervision nodes, a new data block production request is directly sent, wherein Ns is the number of the supervision nodes; if the probability is not greater than 0.7, the data block is produced after the confirmation messages of 1+ Ns/2 supervision nodes are received.
G. A K-means clustering algorithm is pre-deployed in the whole block chain system, and when the number of nodes in the system increases or decreases, the algorithm is started to classify or adjust the added nodes or the difference of the class number caused by the decreased nodes.
Further, the partitioning of the nodes in the algorithm in the step a:
a-a, classifying the nodes in the algorithm into 3 types: client nodes, main nodes and supervision nodes;
a-b, different types of nodes have different numbers, the number of client nodes can be set as Nc, the number of main nodes is Nm, and the number of supervision nodes is Ns;
a-c, the node type division rule is as follows:
a-c-a, when all nodes join the block chain system, besides providing real identity information of the nodes, two items of important information such as expected values x of types to be formed and resources y of the nodes need to be provided;
a-c-b, wherein the value range of the expected value x is 1-5, and the value range of the resource y is 0-1;
a-c-c, when the value of x is 1-3 and the value of y is 0-0.5, the node is a client node; when the value of x is 2-4 and the value of y is 0.5-1, the node is a main node; when the value of x is 3-5 and the value of y is 0-0.5, the node is a supervision node.
Further, the function of the step B is to set:
b-a, the client node is used for initiating a data block production request and producing a data block, predicting the probability that the data block production request initiated by the client node passes the verification through a logistic regression algorithm, and determining whether to produce the data block in advance according to the probability;
b-b, the main node is used for numbering the requests of the client nodes according to a time sequence, extracting the abstract of the requests and then broadcasting the requests to all the supervision nodes according to the numbering sequence;
b-c, the supervision nodes confirm and verify the request according to the numbering sequence, after receiving the message of the main node, the supervision nodes verify whether the request is consistent with the abstract or not and add the verification result (Y is consistent and N is inconsistent) into the message, then broadcast and send the message to other supervision nodes for mutual confirmation and verification, and when the supervision nodes receive more than Ns/2 verification passing messages from other supervision nodes, the supervision nodes reply the confirmation message to the client nodes.
Further, in the step D, a preparation stage in the algorithm process is as follows:
d-a, the client node initiates a Request for producing the data block, the content of the Request message is Request: < Bcontent, Tc, Sc >, Bcontent is the data block to be produced, Tc is the time stamp for the client node to send the Request message, and Sc is the signature of the client node; the data block is divided into a block head and a data part, the block head is the brief information of the previous data block, and if the block is an established block, the block head information is marked as the established block;
d-b, after receiving the Request message, the main node numbers the message, extracts summary information, adds a timestamp and a signature, and then broadcasts and sends the message to all the supervision nodes, wherein the sent message is Query: < Num, < Bcontent, Tc, Sc >, D, Tm, Sm >, Num is the number, D is the summary, Tm is the timestamp added by the main node, and Sm is the signature of the main node;
d-c, after the client node sends the data block production request, a logistic regression algorithm is operated to predict the passing probability of the request.
Further, the prediction process in step d-c is as follows:
d-c-a, sample data that is already present
Figure GDA0003722165850000031
In each sample data
Figure GDA0003722165850000032
Containing 4 eigenvalues
Figure GDA0003722165850000033
d-c-b, wherein x 1 Probability, x, of successful production of a data block for a history of client nodes 2 Probability of passing verification for supervision node historical reply, x 3 Probability of failure of historical reply verification for supervisory node, x 4 The probability that the supervision node does not reply any message is obtained, y is the probability that the request is verified, and the probabilities are within the interval of (0, 1);
d-c-b, of the sample species
Figure GDA0003722165850000041
As input parameters;
d-c-c, subjecting the mixture to the step d-c-b
Figure GDA0003722165850000042
Multiplying by a weight matrix
Figure GDA0003722165850000043
Plus the deviation
Figure GDA0003722165850000044
Obtain intermediate results
Figure GDA0003722165850000045
Figure GDA0003722165850000046
d-c-d, output Y ═ 0,1, intermediate results are scaled using Sigmoid function
Figure GDA0003722165850000047
Converted to a 0 or 1 value, Sigmoid function as follows:
Figure GDA0003722165850000048
obtaining a formula:
Figure GDA0003722165850000049
d-c-e, training by combining the existing sample data according to the formula and Python code to obtain a weight matrix
Figure GDA00037221658500000410
And deviation of
Figure GDA00037221658500000411
The specific numerical values of (a);
d-c-f, obtained
Figure GDA00037221658500000412
And
Figure GDA00037221658500000413
substituting the value of the equation into the following formula to obtain a logistic regression model;
Figure GDA00037221658500000414
d-c-g, testing all initial sample data according to the obtained model, and comparing the predicted probability Y with the original probability Y provided by the sample data to obtain a proper threshold value;
d-c-h, the specific threshold data obtained by the experiment is 0.691, and 0.7 is selected as the threshold value.
d-d, if the prediction probability is larger than the threshold value, the client node pre-produces the data block and temporarily stores the data block, otherwise, the client node waits for a confirmation message. Further, the confirmation stage in the algorithm process in the step E specifically includes the following steps:
e-a, after receiving a Query message sent by a main node, a supervision node firstly performs self-verification to determine whether a verification request is consistent with the abstract, and after adding a self-verification result, broadcasts and sends the message to other supervision nodes for mutual verification, wherein the broadcast and sent message is Valid: < ID, < Num, < Bcontent, Tc, Sc >, D, Tm, Sm > and Y/N >, wherein the ID is the number of the supervision node, Y represents that the verification request is consistent with the abstract, and N represents that the verification request is inconsistent with the abstract;
e-b, when the supervision node receives at least 1+ Ns/2 (Ns is the number of the supervision nodes) Valid messages with Y, replying a confirmation message to the client node, wherein the confirmation message is Commit: < ID, < Num, < Bcontent, Tc, Sc >, D, Tm, Sm >, Y/N, Ts, Ss >, and Ts are time stamps added by the supervision nodes, and Ss are signatures of the supervision nodes;
e-c, when the client node receives at least 1+ Ns/2 (Ns is the number of the supervision nodes) Commit messages containing Y, the pre-produced data blocks are considered to be valid, or the data blocks with the predicted passing probability lower than 0.7 are produced according to the received Commit messages; otherwise, the request does not pass, and the client node discards the request and then initiates the request again.
Further, the K-means clustering algorithm for ensuring the system scalability in the step G is as follows:
g-a, deploying a K-means clustering algorithm in advance, dividing nodes into 3 types, and setting K to be 4 because a plurality of consensus processes are carried out before the total number of the nodes changes and malicious nodes possibly exist in all the nodes;
g-b, after the client node generates the creation block, triggering the K-means algorithm, keeping the K-means algorithm in an open state, and automatically running once the number of the nodes changes;
g-c, at this point, each node has two parameter values that characterize it: the current type x of the node, the frequency y at which the node itself has been operating honestly before,
g-d, wherein the value range of the expected value x is 1-5, and the value range of the resource y is 0-1; g-e, when the number of nodes in the system changes for the first time, the K-means automatically operates and calculates, and the calculation is as follows: g-e-a, as the number of nodes in the system increases:
g-e-a-a, the newly added node needs to provide the expected value x of the type to be formed and the resource y of the newly added node, and the value ranges of the expected value x and the resource y are the same as those in the step g-d;
g-e-a-b, running a K-means algorithm, and optionally selecting 4 nodes from all the nodes as initial clustering centers;
g-e-a-c, calculating the distances from all the other nodes to the 4 initial clustering centers by using Euclidean distances;
g-e-a-d, classifying the node and the nearest clustering center node into the same class according to the calculation result; then, recalculating and selecting a new clustering center node;
g-e-a-d, finishing classification when the distance between the new clustering center node and the last clustering center node is 0.
g-e-b, when the number of nodes in the system is reduced:
g-e-b-a, running K-means algorithm, and optionally selecting 4 nodes as initial clustering centers
g-e-b-b, calculating the distances from all the other nodes to the 4 initial clustering centers by using Euclidean distances;
g-e-b-c, classifying the node and the nearest clustering center node into the same class according to the calculation result; then, recalculating and selecting a new clustering center node;
g-e-b-d, finishing classification when the distance between the new clustering center node and the last clustering center node is 0;
g-f, when the number of the nodes in the system changes again, taking the result of the clustering center obtained by the last calculation as the initial clustering center point of the time.
Has the beneficial effects that: compared with the prior art, the invention has the following advantages:
1. the invention takes the PBFT algorithm as the basis, combines the logistic regression algorithm, and produces the data block in advance through the prediction result of whether the data block production request passes, thereby fully utilizing the time for waiting for the confirmation message, simplifying the steps, reducing the time delay and improving the efficiency.
2. According to the invention, a K-means algorithm is utilized, wherein K is 3, when the total number of the nodes in the system changes, the K-means algorithm is automatically triggered to operate, the changed nodes are classified, the number of the nodes in each category is found out, and the dynamic property and the expandability of the system are ensured.
3. Compared with PBFT, dBFFT and CPBFT algorithms, the method has the advantages of lower time delay, higher throughput and certain dynamic property.
Drawings
FIG. 1 is a flow chart of the algorithm of the present invention;
FIG. 2 is a block diagram of the node class partitioning and responsibilities of the algorithm of the present invention;
FIG. 3 is a flow chart of the algorithm decision of the present invention;
FIG. 4 is a flow chart of a client node predicting whether a request has passed in accordance with the present invention;
FIG. 5 is a K-means algorithm for ensuring system dynamics in accordance with the present invention. .
Detailed Description
Whether or not a specific application example of the present invention is provided to illustrate the calculation process of the present invention below. As shown in fig. 1, the present invention provides a block chain consensus algorithm based on machine learning, which includes the following steps:
A. the algorithm divides the nodes into 3 types: 3 types of client nodes, main nodes and supervisory nodes;
B. different types of nodes have different responsibilities: the method comprises the steps that a client node initiates a data block production request and a production data block, a main node is responsible for numbering requests of the client node and extracting abstract processing, the processed requests are broadcast and sent to a supervision node, and the supervision node verifies the requests and replies confirmation messages to the client node;
C. the algorithm process is divided into a preparation stage and a confirmation stage;
D. in the preparation stage, a client node initiates a data block production request to a main node, the main node numbers the request according to the time sequence of receiving the request, and broadcasts the processed request to a supervision node in sequence after extracting the abstract of the request;
E. in the confirmation stage, after receiving the request, the supervision node verifies whether the request is consistent with the abstract, namely Y, and N, adds a self-verification result to the message after verification, then broadcasts and sends the message to other supervision nodes to realize mutual confirmation and verification of the request, and replies a confirmation message to the client node after verification is passed;
F. after the client node sends the data block production request, a logistic regression algorithm is operated to predict the passing probability of the request. If the probability is greater than 0.7, the data block is produced in advance, and a new data block production request is directly sent after the confirmation messages of 1+ Ns/2 (Ns is the number of the supervision nodes) supervision nodes are received. If the probability is not greater than 0.7, the data block is produced after the confirmation messages of 1+ Ns/2 supervision nodes are received.
G. A K-means clustering algorithm is pre-deployed in the whole system, when the number of nodes in the system is increased or decreased, the algorithm is started, the increased nodes are automatically classified or adjusted to reduce the difference of the number of the classes caused by the decreased nodes, and the expandability of the system is ensured. The new improved algorithm aiming at the PBFT algorithm provided by the invention has certain expandability and dynamics, and meanwhile, the performance aspects such as time delay, throughput and the like are obviously improved.
As shown in fig. 2, in the present invention, the partitioning of node types in the algorithm in step a:
a-a, the nodes in the algorithm are divided into 3 types: client nodes, main nodes and supervision nodes;
a-b, different types of nodes have different numbers, the number of client nodes can be set as Nc, the number of main nodes is Nm, and the number of supervision nodes is Ns;
a-c, the node type division rule is as follows:
a-c-a, when all nodes join the block chain system, besides providing real identity information of the nodes, two items of important information such as expected values x of types to be formed and resources y of the nodes need to be provided;
a-c-b, wherein the value range of the expected value x is 1-5, and the value range of the resource y is 0-1;
a-c-c, when the value of x is 1-3 and the value of y is 0-0.5, the type of node is a client node; when the value of x is 2-4 and the value of y is 0.5-1, the node is a main node; when the value of x is 3-5 and the value of y is 0-0.5, the node is a supervision node.
Different types of nodes in step B have different responsibilities:
b-a, the client node is mainly responsible for initiating a data block production request and producing a data block, and meanwhile, the probability that the data block production request initiated by the client node passes verification can be predicted through a logistic regression algorithm, and whether the data block can be produced in advance is determined according to the probability;
b-b, the main node is mainly responsible for numbering the requests of the client nodes according to a time sequence, extracting the abstract of the requests and then broadcasting the requests to all the supervision nodes according to the numbering sequence;
b-c, the supervision node is mainly responsible for confirming and verifying the request, after receiving the message of the main node, the supervision node verifies whether the request is consistent with the abstract or not and adds a verification result (Y is consistent and N is inconsistent) into the message, then broadcasts and sends the message to other supervision nodes for mutual confirmation and verification, and when the supervision node receives more than Ns/2 verification passing (including Y) messages from other supervision nodes, the supervision node replies a confirmation message to the client node.
As shown in fig. 3, in the present invention, in the step D, the preparation stage in the algorithm process is as follows:
d-a, the client node initiates a Request for producing the data block, the Request message is Request: < Bcontent, Tc, Sc >, Bcontent is the data block to be produced, Tc is the time stamp of the client node sending the Request message, Sc is the signature of the client node; the data block is divided into a block head and a data part, the block head is the brief information of the previous data block, if the block is an established block, the block head information only marks that the block is the established block;
d-b, after receiving the Request message, the main node numbers the message, extracts summary information, adds a timestamp and a signature, and then broadcasts and sends the message to all supervision nodes, wherein the sent message is Query: < Num, < Bcontent, Tc, Sc >, D, Tm, Sm >, Num is the number, D is the summary, Tm is the timestamp added by the main node, and Sm is the signature of the main node;
d-c, after the client node sends the data block production request, operating a logistic regression algorithm to predict the passing probability of the request;
d-d, setting the threshold value to be 0.7, if the prediction probability is more than 0.7, the client node pre-produces the data block and temporarily stores the data block, otherwise, waiting for a confirmation message.
And E, a confirmation stage in the algorithm process:
e-a, after receiving a Query message sent by a main node, a supervisory node firstly verifies itself, verifies whether a verification request is consistent with the abstract or not, and broadcasts and sends messages to other supervisory nodes for mutual verification confirmation after adding a self-verification result, wherein the broadcast-sent messages are Valid: < ID, < Num, < Bcontent, Tc, Sc >, D, Tm, Sm >, Y/N >, the ID is the number of the supervisory node, Y represents that the verification request is consistent with the abstract, and N represents that the verification request is inconsistent with the abstract;
e-b, when the supervision node receives at least 1+ Ns/2 (Ns is the number of the supervision nodes) Valid messages with Y, replying a confirmation message to the client node, wherein the confirmation message is Commit: < ID, < Num, < Bcontent, Tc, Sc >, D, Tm, Sm >, Y/N, Ts, Ss >, and Ts are time stamps added by the supervision nodes, and Ss are signatures of the supervision nodes;
e-c, when the client node receives at least 1+ Ns/2 (Ns is the number of the supervision nodes) Commit messages containing Y, the pre-produced data blocks are considered to be valid, or the data blocks with the predicted passing probability lower than 0.7 are produced according to the received Commit messages; otherwise, the request does not pass, and the client node discards the request and then re-initiates the request.
As shown in fig. 4, in the present invention, the prediction algorithm flow for the D-c client node to predict whether the request passes through in step D:
d-c-a, sample data that is now present
Figure GDA0003722165850000091
In each sample data
Figure GDA0003722165850000092
Containing 4 eigenvalues
Figure GDA0003722165850000093
d-c-b, wherein x 1 Probability, x, of successful production of a data block for a history of client nodes 2 Probability of passing verification for supervision node historical reply, x 3 Probability of failure of historical reply verification for supervisory node, x 4 The probability that the supervision node does not reply any message is obtained, y is the probability that the request is verified, and the probabilities are within the interval of (0, 1);
d-c-b, sample species of
Figure GDA0003722165850000094
As input parameters;
d-c-c, subjecting the mixture to the step d-c-b
Figure GDA0003722165850000095
Multiplying by a weight matrix
Figure GDA0003722165850000096
Plus the deviation
Figure GDA0003722165850000097
Obtaining intermediate results
Figure GDA0003722165850000098
Figure GDA0003722165850000099
d-c-d, output Y ═ 0,1, intermediate results are scaled using Sigmoid function
Figure GDA0003722165850000101
Converted to a 0 or 1 value, Sigmoid function as follows:
Figure GDA0003722165850000102
obtaining a formula:
Figure GDA0003722165850000103
d-c-e, training by combining the existing sample data according to the formula and Python code to obtain a weight matrix
Figure GDA0003722165850000104
And deviation of
Figure GDA0003722165850000105
The specific numerical values of (a);
d-c-f, will obtain
Figure GDA0003722165850000106
And
Figure GDA0003722165850000107
substituting the value of the equation into the following formula to obtain a logistic regression model;
Figure GDA0003722165850000108
d-c-g, testing all initial sample data according to the obtained model, and comparing the predicted probability Y with the original probability Y provided by the sample data to obtain a proper threshold value;
d-c-h, the specific threshold data obtained by the experiment is 0.691, and 0.7 is selected as the threshold value.
d-d, if the prediction probability is larger than the threshold value, the client node pre-produces the data block and temporarily stores the data block, otherwise, the client node waits for a confirmation message.
As shown in fig. 5, in the present invention, the K-means clustering algorithm for ensuring system scalability in step G:
g-a, a K-means clustering algorithm is pre-deployed in the whole system, the nodes are divided into 3 types by the proposed improved algorithm, and malicious nodes may exist in all the nodes due to a plurality of consensus processes before the total number of the nodes changes, so that K is set to be 4;
g-b, after the creation block is successfully generated, triggering a K-means algorithm, keeping the K-means algorithm in an open state, and automatically running once the number of nodes in the system changes;
g-c, at this point, each node has two parameter values that characterize it: the current type x of the node, the frequency y at which the node itself has been operating honestly before,
g-d, wherein the range of the expected value x is 1-5, and the range of the resource y is 0-1; g-e, when the number of nodes in the system changes for the first time, the K-means automatically operates and calculates, and the calculation is as follows: g-e-a, as the number of nodes in the system increases:
g-e-a-a, the newly added node needs to provide the expected value x of the type to be formed and the resource y of the newly added node, and the value ranges of the expected value x and the resource y are the same as those in the step g-d;
g-e-a-b, running a K-means algorithm, and optionally selecting 4 nodes from all the nodes as initial clustering centers;
g-e-a-c, calculating the distances from all the other nodes to the 4 initial clustering centers by using Euclidean distances;
g-e-a-d, classifying the node and the nearest clustering center node into the same class according to the calculation result; then, recalculating and selecting a new clustering center node;
g-e-a-d, finishing classification when the distance between the new clustering center node and the last clustering center node is 0.
g-e-b, when the number of nodes in the system is reduced:
g-e-b-a, running K-means algorithm, and optionally selecting 4 nodes as initial clustering centers
g-e-b-b, calculating the distances from all the other nodes to the 4 initial clustering centers by using Euclidean distances;
g-e-b-c, classifying the node and the nearest clustering center node into the same class according to the calculation result; then, recalculating and selecting a new clustering center node;
g-e-b-d, finishing classification when the distance between the new clustering center node and the last clustering center node is 0;
g-f, when the number of the nodes in the system changes again, taking the result of the clustering center obtained by the last calculation as the initial clustering center point of the time.
The improved algorithm provided by the invention can promote the development of the consensus algorithm; the invention takes the PBFT algorithm as a basis, combines the logistic regression algorithm, and produces the data block in advance through the prediction result of whether the data block production request passes or not, thereby fully utilizing the time for waiting for the confirmation message, simplifying the steps, reducing the time delay and improving the efficiency. Meanwhile, the algorithm utilizes a K-means algorithm, wherein K is 3, when the total number of the nodes in the system changes, the K-means algorithm is automatically triggered to operate, the changed nodes are classified, the number of the nodes in each category is found out, and the dynamic property and the expandability of the system are ensured. Compared with PBFT, dBFFT and CPBFT algorithms, the method has the advantages of lower time delay, higher throughput, certain dynamic property and certain safety.
Although the algorithm proposed by the present invention has been described in detail, it will be appreciated by those skilled in the art that various comparisons, researches, and modifications can be made to the algorithm without departing from the principle and spirit of the invention, and the scope of the invention is defined by the appended claims and their equivalents.

Claims (7)

1. A block chain consensus method based on machine learning is characterized in that: the method comprises the following steps:
A. all nodes in the network are divided into 3 types: each type of node has specific number, the number of the client nodes is Nc, the number of the main nodes is Nm, and the number of the supervision nodes is Ns;
B. each type of node has different functions: the client node is used for initiating a data block production request and producing a data block; the main node is used for numbering the requests of the client nodes, extracting the abstract and sending the processed requests to the supervision node; the supervision node is used for verifying the request and replying a confirmation message to the client node;
C. setting an algorithm process of a preparation stage and a confirmation stage;
D. a preparation stage: the client node initiates a data block production request to the main node, and the main node numbers the data block production request according to the time sequence of receiving the request and extracts the abstract of the request; then, the processed requests are sent to a supervision node according to the serial number sequence;
E. and (3) confirmation stage: after receiving the request, the supervision node verifies whether the request is consistent with the abstract, the request is Y and the abstract is N, adds a self verification result to the message after verification, then sends the message to other supervision nodes to realize mutual confirmation and verification of the request, and replies a confirmation message to the client node after the verification is passed;
F. after the client node sends a data block production request, a logistic regression algorithm is operated to predict the passing probability of the request; if the probability is larger than 0.7, the data block is produced in advance, and after the confirmation messages of 1+ Ns/2 supervision nodes are received, a new data block production request is directly sent; if the probability is not more than 0.7, the data block can be produced only after the confirmation messages of 1+ Ns/2 supervision nodes are received;
G. a K-means clustering algorithm is pre-deployed in the whole block chain system, and when the number of nodes in the system increases or decreases, the algorithm is started to classify or adjust the increased nodes or the category number difference caused by the decreased nodes.
2. The machine learning-based block chain consensus method of claim 1, wherein: the division of the nodes in the step A:
a-a, the nodes are classified into 3 types: client nodes, main nodes and supervision nodes;
a-b, different types of nodes have different numbers, the number of client nodes can be set as Nc, the number of main nodes is Nm, and the number of supervision nodes is Ns;
a-c, the node type division rule is as follows:
a-c-a, when all nodes join the block chain system, besides providing real identity information of the nodes, two important information such as expected values x of types to be formed and resources y of the nodes are also provided;
a-c-b, wherein the value range of the expected value x is 1-5, and the value range of the resource y is 0-1;
a-c-c, when the value of x is 1-3 and the value of y is 0-0.5, the type of node is a client node; when the value of x is 2-4 and the value of y is 0.5-1, the node is a main node; when the value of x is 3-5 and the value of y is 0-0.5, the node is a supervision node.
3. The machine learning-based block chain consensus method of claim 1, wherein: the step B comprises the following function settings:
b-a, the client node is used for initiating a data block production request and producing a data block, predicting the probability that the data block production request initiated by the client node passes the verification through a logistic regression algorithm, and determining whether to produce the data block in advance according to the probability;
b-b, the main node is used for numbering the requests of the client nodes according to a time sequence, extracting the abstract of the requests and then broadcasting the requests to all the supervision nodes according to the numbering sequence;
b-c, the supervision nodes confirm and verify the request according to the numbering sequence, after receiving the message of the main node, the request is consistent with Y, the request is inconsistent with the abstract, the verification result is added into the message, then the message is broadcasted to other supervision nodes to carry out mutual confirmation and verification, and when the supervision nodes receive more than Ns/2 verification passing messages from other supervision nodes, the supervision nodes reply the confirmation message to the client nodes.
4. The method of claim 1, wherein the method comprises: the preparation stage in the step D:
d-a, a client node initiates a Request for producing data blocks, the content of the Request message is Request: < Bcontent, Tc, Sc >, Bcontent is the data blocks to be produced, Tc is the time stamp for the client node to send the Request message, and Sc is the signature of the client node; the data block is divided into a block head and a data part, the block head is the brief information of the previous data block, and if the block is an established block, the block head information is marked as the established block;
d-b, after receiving the Request message, the main node numbers the message, extracts summary information, adds a timestamp and a signature, broadcasts and sends the message to all supervision nodes, wherein the sent message is Query: < Num, < Bcontent, Tc, Sc >, D, Tm, Sm >, Num is the number, D is the summary, Tm is the timestamp added by the main node, and Sm is the signature of the main node;
d-c, after the client node sends the data block production request, a logistic regression algorithm is operated to predict the passing probability of the request.
5. The machine learning-based blockchain consensus method of claim 4, wherein: the prediction process of steps d-c is as follows:
d-c-a, sample data that is already present
Figure FDA0003722165840000031
In each sample data
Figure FDA0003722165840000032
Containing 4 eigenvalues
Figure FDA0003722165840000033
d-c-b, wherein x 1 Probability, x, of successful production of a data block for a history of client nodes 2 Probability of passing verification for supervision node historical reply, x 3 Probability of failure of historical reply verification for supervisory node, x 4 The probability that the supervision node does not reply any message is obtained, y is the probability that the request is verified, and the probabilities are within the interval of (0, 1);
d-c-b, of the sample species
Figure FDA0003722165840000034
As input parameters;
d-c-c, subjecting the mixture to the step d-c-b
Figure FDA0003722165840000035
Multiplying by a weight matrix
Figure FDA0003722165840000036
Plus the deviation
Figure FDA0003722165840000037
Obtaining intermediate results
Figure FDA0003722165840000038
Figure FDA0003722165840000039
d-c-d, output Y ═ 0,1, intermediate results are scaled using Sigmoid function
Figure FDA00037221658400000310
Converted to a 0 or 1 value, Sigmoid function as follows:
Figure FDA00037221658400000311
obtaining a formula:
Figure FDA00037221658400000312
d-c-e, training by combining the existing sample data according to the formula and Python code to obtain a weight matrix
Figure FDA0003722165840000041
And deviation of
Figure FDA0003722165840000042
The specific numerical values of (a);
d-c-f, obtained
Figure FDA0003722165840000043
And
Figure FDA0003722165840000044
substituting the value of (a) into a formula to obtain a logistic regression model;
Figure FDA0003722165840000045
d-c-g, testing all initial sample data according to the obtained model, and comparing the predicted probability Y with the original probability Y provided by the sample data to obtain a proper threshold value;
d-c-h, wherein the specific threshold data obtained by the experiment is 0.691, and 0.7 is selected as the threshold;
d-d, if the prediction probability is larger than the threshold value, the client node pre-produces the data block and temporarily stores the data block, otherwise, the client node waits for a confirmation message.
6. The machine learning-based block chain consensus method of claim 1, wherein: the confirmation stage in the step E specifically includes the following steps:
e-a, after receiving a Query message sent by a main node, a supervision node firstly performs self-verification to determine whether a verification request is consistent with the abstract, and after adding a self-verification result, broadcasts and sends the message to other supervision nodes for mutual verification, wherein the broadcast and sent message is Valid: < ID, < Num, < Bcontent, Tc, Sc >, D, Tm, Sm > and Y/N >, wherein the ID is the number of the supervision node, Y represents that the verification request is consistent with the abstract, and N represents that the verification request is inconsistent with the abstract;
e-b, when the supervision node receives at least 1+ Ns/2 Valid messages with Y, replying a confirmation message to the client node, wherein the confirmation message is Commit: < ID, < Num, < Bcontent, Tc, Sc >, D, Tm, Sm >, Y/N, Ts, Ss >, and Ts is a timestamp added by the supervision node and Ss is a signature of the supervision node;
e-c, when the client node receives at least 1+ Ns/2 Commit messages containing Y, considering the pre-produced data blocks to be effective, or producing the data blocks with the predicted passing probability lower than 0.7 according to the received Commit messages; otherwise, the request does not pass, and the client node discards the request and then initiates the request again.
7. The machine learning-based block chain consensus method according to claim 1, wherein said step G is a K-means clustering algorithm that guarantees system scalability:
g-a, deploying a K-means clustering algorithm in advance, dividing nodes into 3 types, and setting K to be 4 because a plurality of consensus processes are carried out before the total number of the nodes changes and malicious nodes possibly exist in all the nodes;
g-b, after the customer nodes generate the creation blocks, triggering a K-means algorithm, enabling the K-means algorithm to be in an open state all the time, and automatically running once the number of the nodes changes;
g-c, at this point, each node has two parameter values that characterize it: the current type x of the node, the frequency y at which the node itself has been operating honestly before,
g-d, wherein the range of the expected value x is 1-5, and the range of the resource y is 0-1;
g-e, when the number of nodes in the system changes for the first time, the K-means automatically operates and calculates, and the specific steps are as follows:
g-e-a, as the number of nodes in the system increases:
g-e-a-a, the newly added node needs to provide the expected value x of the type to be formed and the resource y of the newly added node, and the value ranges of the expected value x and the resource y are the same as those in the step g-d;
g-e-a-b, operating a K-means algorithm, and selecting 4 nodes from all the nodes as initial clustering centers;
g-e-a-c, calculating the distances from all the other nodes to the 4 initial clustering centers by using Euclidean distances;
g-e-a-d, classifying the node and the nearest clustering center node into the same class according to the calculation result; then, recalculating and selecting a new clustering center node;
g-e-a-d, finishing classification when the distance between the new clustering center node and the last clustering center node is 0;
g-e-b, when the number of nodes in the system is reduced:
g-e-b-a, running K-means algorithm, and optionally selecting 4 nodes as initial clustering centers
g-e-b-b, calculating the distances from all the other nodes to the 4 initial clustering centers by using Euclidean distances;
g-e-b-c, classifying the node and the nearest clustering center node into the same class according to the calculation result; then, recalculating and selecting a new clustering center node;
g-e-b-d, finishing classification when the distance between the new clustering center node and the last clustering center node is 0;
g-f, when the number of the nodes in the system changes again, taking the result of the clustering center obtained by the last calculation as the initial clustering center point of the time.
CN202010273144.XA 2020-04-09 2020-04-09 Block chain consensus method based on machine learning Active CN111598127B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010273144.XA CN111598127B (en) 2020-04-09 2020-04-09 Block chain consensus method based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010273144.XA CN111598127B (en) 2020-04-09 2020-04-09 Block chain consensus method based on machine learning

Publications (2)

Publication Number Publication Date
CN111598127A CN111598127A (en) 2020-08-28
CN111598127B true CN111598127B (en) 2022-08-26

Family

ID=72181904

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010273144.XA Active CN111598127B (en) 2020-04-09 2020-04-09 Block chain consensus method based on machine learning

Country Status (1)

Country Link
CN (1) CN111598127B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112954074B (en) * 2021-03-29 2022-05-06 北京三快在线科技有限公司 Block chain network connection method and device
CN113365229B (en) * 2021-05-28 2022-03-25 电子科技大学 Network time delay optimization method of multi-union chain consensus algorithm
CN113395357B (en) * 2021-08-16 2021-11-12 支付宝(杭州)信息技术有限公司 Method and device for fragmenting block chain system
CN114978684B (en) * 2022-05-20 2023-07-04 江南大学 PBFT consensus method based on improved condensation hierarchical clustering

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109919760A (en) * 2019-01-11 2019-06-21 南京邮电大学 Byzantine failure tolerance common recognition algorithm based on voting mechanism
CN110113388A (en) * 2019-04-17 2019-08-09 四川大学 A kind of method and apparatus of the block catenary system common recognition based on improved clustering algorithm

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109919760A (en) * 2019-01-11 2019-06-21 南京邮电大学 Byzantine failure tolerance common recognition algorithm based on voting mechanism
CN110113388A (en) * 2019-04-17 2019-08-09 四川大学 A kind of method and apparatus of the block catenary system common recognition based on improved clustering algorithm

Also Published As

Publication number Publication date
CN111598127A (en) 2020-08-28

Similar Documents

Publication Publication Date Title
CN111598127B (en) Block chain consensus method based on machine learning
Gyawali et al. Machine learning and reputation based misbehavior detection in vehicular communication networks
Zheng et al. Optimization of PBFT algorithm based on improved C4. 5
Yang et al. A novel solutions for malicious code detection and family clustering based on machine learning
Kaiafas et al. Detecting malicious authentication events trustfully
CN115065468B (en) PBFT consensus optimization method based on group reputation value
CN111935207A (en) Block chain system consensus method based on improved C4.5 algorithm
CN113301047A (en) Vehicle networking node consistency consensus method based on malicious node attack detection
Osman et al. Artificial neural network model for decreased rank attack detection in RPL based on IoT networks
CN113704328B (en) User behavior big data mining method and system based on artificial intelligence
CN113434704A (en) Knowledge graph processing method based on big data and cloud computing system
CN116094721A (en) Clustering-based extensible shard consensus algorithm
CN113704772B (en) Safety protection processing method and system based on user behavior big data mining
CN114661549B (en) Random forest-based system activity prediction method and system
CN114143060B (en) Information security prediction method based on artificial intelligence prediction and big data security system
CN115271724A (en) Grouping Byzantine fault-tolerant algorithm based on dynamic trust model
Gu et al. Primary node selection algorithm of PBFT based on anomaly detection and reputation model
CN113038427B (en) Block chain cross-region authentication method based on credit mechanism and DPOS
Kuang et al. GTMS: A gated linear unit based trust management system for internet of vehicles using blockchain technology
CN112910873B (en) Useful workload proving method and system for block chain transaction anomaly detection
CN117544656A (en) Communication command management method and system based on micro-service framework
Long et al. A method of machine learning for social bot detection combined with sentiment analysis
CN116821455A (en) Regional data backtracking analysis method and system based on social tool
Marouane et al. A review and a tutorial of ML-based MDS technology within a VANET context: From data collection to trained model deployment
Concone et al. A novel recruitment policy to defend against sybils in vehicular crowdsourcing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant